From owner-freebsd-fs@FreeBSD.ORG Sun Jul 24 17:34:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C651106564A for ; Sun, 24 Jul 2011 17:34:42 +0000 (UTC) (envelope-from minimarmot@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 5F1BE8FC13 for ; Sun, 24 Jul 2011 17:34:42 +0000 (UTC) Received: by ywf7 with SMTP id 7so2300657ywf.13 for ; Sun, 24 Jul 2011 10:34:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=h2pWg9JZPe85fyjVxtBH7YPIA9GphDa8xaVkFZESU54=; b=XToE+IH1VuKU8PNSisGDSPC9shE3EbEBj5e4/o8VY0JhVTgevg72/vzXMjXV302rgo kPucDFOOAFdhBB8ZTiyU9nblytMz9PUg3LxIduCJ6YXQjzAEoU1Cw2KBtKOKQSsPQzFy wKu2el1HZl0SAcGZQZyQsY7z38pOL8NSm/2dk= MIME-Version: 1.0 Received: by 10.236.170.7 with SMTP id o7mr4717305yhl.459.1311527068230; Sun, 24 Jul 2011 10:04:28 -0700 (PDT) Received: by 10.236.109.147 with HTTP; Sun, 24 Jul 2011 10:04:28 -0700 (PDT) In-Reply-To: <201104011250.p31CoULd045353@svn.freebsd.org> References: <201104011250.p31CoULd045353@svn.freebsd.org> Date: Sun, 24 Jul 2011 13:04:28 -0400 Message-ID: From: Ben Kaduk To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2011 17:34:42 -0000 [replying to -fs since that is where the original discussion of adding O_CLOEXEC occurred] On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov wrote= : > Author: kib > Date: Fri Apr =A01 12:50:29 2011 > New Revision: 220241 > URL: http://svn.freebsd.org/changeset/base/220241 > > Log: > =A0MFC r219999: > =A0Add O_CLOEXEC flag to open(2) and fhopen(2). I saw mail go by on debian-bsd@lists.debian.org that the are going to pick up on these O_CLOEXEC definitions and export them, which included the comment: No O_SEARCH yet, since FreeBSD doesn't seem to implement it. Would there be any reason for us to support O_SEARCH? http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html does not make it very clear to me whether we would want to.... -Ben Kaduk From owner-freebsd-fs@FreeBSD.ORG Sun Jul 24 18:00:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C055106564A; Sun, 24 Jul 2011 18:00:36 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.ip6.digiware.nl [IPv6:2001:4cb8:1:106::2]) by mx1.freebsd.org (Postfix) with ESMTP id 140E08FC1A; Sun, 24 Jul 2011 18:00:36 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id EB22B153434; Sun, 24 Jul 2011 20:00:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NEq-LaHqnyMG; Sun, 24 Jul 2011 20:00:32 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc] (unknown [IPv6:2001:4cb8:3:1:c02b:ce62:71ff:9cbc]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id 235F3153433; Sun, 24 Jul 2011 20:00:32 +0200 (CEST) Message-ID: <4E2C5DBF.3050104@digiware.nl> Date: Sun, 24 Jul 2011 20:00:31 +0200 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Ivan Voras References: <13577F3E-DE59-44F4-98F7-9587E26499B8@gmail.com> In-Reply-To: X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2011 18:00:36 -0000 On 21-7-2011 19:18, Ivan Voras wrote: > On 21 July 2011 18:38, Luiz Otavio O Souza wrote: > >> The general usage on this server is fine, but the periodic (daily) scripts take almost a day to complete and the server is slow as hell while the daily scripts are running. > > Yes, this is how my problem was first diagnosed. > >> So, yes, i can confirm that running 'find' on a ZFS FS with a lot of files is very, very slow (and looks like it isn't related to how the files are distributed on the FS). > > Only it's not just "find" - it's any directory operations - including > file creation and removal. I cannot say that is not related to how > files are distributed on the file system, except the unusually long > operations on the parent of the shard directories in my case. A little late in the thread: Running on 8.2-stable, ZFS version 15 Quad core, 8Gb memory, /home is on a 6-disk(sata) raidz2 fs. The dicertory is a 3 week revolving log of images taken from a security com. So if anything it directory-file should be horribly thrashed It is around 170.000 files in one directory. [/home/sonycam] wjw@zfs.digiware.nl> ls periodical | wc 177364 177364 3369916 0.421u 6.999s 2:00.15 6.1% 37+1522k 0+0io 0pf+0w [/home/sonycam] wjw@zfs.digiware.nl> ls -asl periodical | wc 177401 1774002 13659785 1.747u 11.087s 1:42.98 12.4% 36+1562k 0+0io 0pf+0w Repeated finds after this complete with 10 secs. On average I seen about 100 IOPS/disk and readin is at 5Mbyte/disk. But I do not feel the system is really loaded while doing the ls. I can easily login again, and do other work. --WjW From owner-freebsd-fs@FreeBSD.ORG Sun Jul 24 18:44:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 397101065673 for ; Sun, 24 Jul 2011 18:44:09 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 921C98FC08 for ; Sun, 24 Jul 2011 18:44:08 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6OIi4JG014507 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 24 Jul 2011 21:44:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p6OIi44g057311; Sun, 24 Jul 2011 21:44:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6OIi4Bd057310; Sun, 24 Jul 2011 21:44:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 24 Jul 2011 21:44:04 +0300 From: Kostik Belousov To: Ben Kaduk Message-ID: <20110724184404.GB17489@deviant.kiev.zoral.com.ua> References: <201104011250.p31CoULd045353@svn.freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="h0EypPxsCPvv0kw1" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2011 18:44:09 -0000 --h0EypPxsCPvv0kw1 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 24, 2011 at 01:04:28PM -0400, Ben Kaduk wrote: > [replying to -fs since that is where the original discussion of adding > O_CLOEXEC occurred] >=20 > On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov wro= te: > > Author: kib > > Date: Fri Apr =9A1 12:50:29 2011 > > New Revision: 220241 > > URL: http://svn.freebsd.org/changeset/base/220241 > > > > Log: > > =9AMFC r219999: > > =9AAdd O_CLOEXEC flag to open(2) and fhopen(2). >=20 > I saw mail go by on debian-bsd@lists.debian.org that the are going to > pick up on these O_CLOEXEC definitions and export them, which included > the comment: > No O_SEARCH yet, since FreeBSD doesn't seem to implement it. What do you mean by exporting them ? >=20 > Would there be any reason for us to support O_SEARCH? > http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html > does not make it very clear to me whether we would want to.... We do not support O_SEARCH because nobody implemented it yet. --h0EypPxsCPvv0kw1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4sZ/QACgkQC3+MBN1Mb4gOrQCeLt0DZEgVVKqYYHEbUlDqG4yp e0IAoMxJSnLG5B7gsv7lA/mAoq7Z8y/3 =hLnR -----END PGP SIGNATURE----- --h0EypPxsCPvv0kw1-- From owner-freebsd-fs@FreeBSD.ORG Sun Jul 24 18:50:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48063106566C for ; Sun, 24 Jul 2011 18:50:48 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id EB97A8FC1B for ; Sun, 24 Jul 2011 18:50:47 +0000 (UTC) X-AuditID: 12074424-b7b0fae000000a08-3a-4e2c69999291 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id BE.7A.02568.9996C2E4; Sun, 24 Jul 2011 14:51:05 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id p6OIok3K023144; Sun, 24 Jul 2011 14:50:46 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p6OIojhE000239 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 24 Jul 2011 14:50:46 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p6OIoiMV027798; Sun, 24 Jul 2011 14:50:44 -0400 (EDT) Date: Sun, 24 Jul 2011 14:50:44 -0400 (EDT) From: Benjamin Kaduk To: Kostik Belousov In-Reply-To: <20110724184404.GB17489@deviant.kiev.zoral.com.ua> Message-ID: References: <201104011250.p31CoULd045353@svn.freebsd.org> <20110724184404.GB17489@deviant.kiev.zoral.com.ua> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-1361943179-1311533444=:7526" X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmplleLIzCtJLcpLzFFi42IRYrdT152ZqeNncPKiuMWxxz/ZLBqmPWZz YPKY8Wk+i8fOWXfZA5iiuGxSUnMyy1KL9O0SuDJ+LW5iKnjAW/Gw+wVTA+Ny7i5GTg4JAROJ lv/PWCFsMYkL99azdTFycQgJ7GOU2Ll6FROEs4FRYtW3+awQzgEmiZf7LrNAOA2MEou2Hwfr ZxHQlpjYcpcFxGYTUJGY+WYjG4gtIqApcW3TfaBRHBzMAlISd9ZWgISFBWwkvq3YxAwS5hSw l5h5NhAkzCvgIHHp0CtmiPEbGSWe72gEGykqoCOxev8UFogiQYmTM5+A2cwCARKLZk9gnsAo OAtJahaSFIRtLvHu5iNWCFtb4v7NNrYFjCyrGGVTcqt0cxMzc4pTk3WLkxPz8lKLdM31cjNL 9FJTSjcxgkPbRWUHY/MhpUOMAhyMSjy8mvY6fkKsiWXFlbmHGCU5mJREeW9kAIX4kvJTKjMS izPii0pzUosPMUpwMCuJ8Gac0/YT4k1JrKxKLcqHSUlzsCiJ85Z6//cVEkhPLEnNTk0tSC2C ycpwcChJ8N4GGSpYlJqeWpGWmVOCkGbi4AQZzgM0/BxIDW9xQWJucWY6RP4Uo6KUOG8+SEIA JJFRmgfXC0s9rxjFgV4R5r0AUsUDTFtw3a+ABjMBDZZX0AQZXJKIkJJqYDQ0arnorOubcVC1 XKHPib/i/2vfmYsX6OT0iX+c+OpvgXhQHPOBGd4yi5jnLWAz4Y/oVo65t7T81OSrDHXbf7W/ XZIx7eb2s8VzmfxOsJxZrbVcJ66sTKtncqtMyqr1IqrVjKFtXNcennyfZBwZbehsabY36swt r8nFLHlbvr6eeIxBrk/smxJLcUaioRZzUXEiAI/spkoYAwAA Cc: freebsd-fs@freebsd.org Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2011 18:50:48 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---559023410-1361943179-1311533444=:7526 Content-Type: TEXT/PLAIN; charset=koi8-r; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Sun, 24 Jul 2011, Kostik Belousov wrote: > On Sun, Jul 24, 2011 at 01:04:28PM -0400, Ben Kaduk wrote: >> [replying to -fs since that is where the original discussion of adding >> O_CLOEXEC occurred] >> >> On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov wr= ote: >>> Author: kib >>> Date: Fri Apr =9A1 12:50:29 2011 >>> New Revision: 220241 >>> URL: http://svn.freebsd.org/changeset/base/220241 >>> >>> Log: >>> =9AMFC r219999: >>> =9AAdd O_CLOEXEC flag to open(2) and fhopen(2). >> >> I saw mail go by on debian-bsd@lists.debian.org that the are going to >> pick up on these O_CLOEXEC definitions and export them, which included >> the comment: >> No O_SEARCH yet, since FreeBSD doesn't seem to implement it. > What do you mean by exporting them ? Per http://lists.debian.org/debian-bsd/2011/07/msg00299.html , it is not=20 possible for them to use our sys/fcntl.h directly, so its contents must be= =20 copied into a bits/fcntl.h that enters somehow into their framework. (I=20 am not familiar with how this framework works.) > >> >> Would there be any reason for us to support O_SEARCH? >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html >> does not make it very clear to me whether we would want to.... > > We do not support O_SEARCH because nobody implemented it yet. Sure, but is it worth filing a PR as a reminder? -Ben ---559023410-1361943179-1311533444=:7526-- From owner-freebsd-fs@FreeBSD.ORG Sun Jul 24 18:56:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A8CC1065670 for ; Sun, 24 Jul 2011 18:56:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id AB0808FC0C for ; Sun, 24 Jul 2011 18:56:33 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6OIuT51015514 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 24 Jul 2011 21:56:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p6OIuT5P057865; Sun, 24 Jul 2011 21:56:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6OIuT1I057863; Sun, 24 Jul 2011 21:56:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 24 Jul 2011 21:56:29 +0300 From: Kostik Belousov To: Benjamin Kaduk Message-ID: <20110724185629.GD17489@deviant.kiev.zoral.com.ua> References: <201104011250.p31CoULd045353@svn.freebsd.org> <20110724184404.GB17489@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VQzZ2Dp9L8/8MSeh" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: svn commit: r220241 - in stable/8/sys: kern sys X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2011 18:56:34 -0000 --VQzZ2Dp9L8/8MSeh Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jul 24, 2011 at 02:50:44PM -0400, Benjamin Kaduk wrote: > On Sun, 24 Jul 2011, Kostik Belousov wrote: >=20 > >On Sun, Jul 24, 2011 at 01:04:28PM -0400, Ben Kaduk wrote: > >>[replying to -fs since that is where the original discussion of adding > >>O_CLOEXEC occurred] > >> > >>On Fri, Apr 1, 2011 at 8:50 AM, Konstantin Belousov = =20 > >>wrote: > >>>Author: kib > >>>Date: Fri Apr =9A1 12:50:29 2011 > >>>New Revision: 220241 > >>>URL: http://svn.freebsd.org/changeset/base/220241 > >>> > >>>Log: > >>>MFC r219999: > >>>Add O_CLOEXEC flag to open(2) and fhopen(2). > >> > >>I saw mail go by on debian-bsd@lists.debian.org that the are going to > >>pick up on these O_CLOEXEC definitions and export them, which included > >>the comment: > >>No O_SEARCH yet, since FreeBSD doesn't seem to implement it. > >What do you mean by exporting them ? >=20 > Per http://lists.debian.org/debian-bsd/2011/07/msg00299.html , it is not= =20 > possible for them to use our sys/fcntl.h directly, so its contents must b= e=20 > copied into a bits/fcntl.h that enters somehow into their framework. (I= =20 > am not familiar with how this framework works.) >=20 > > > >> > >>Would there be any reason for us to support O_SEARCH? > >>http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html > >>does not make it very clear to me whether we would want to.... > > > >We do not support O_SEARCH because nobody implemented it yet. >=20 > Sure, but is it worth filing a PR as a reminder? Without the patch ? No. --VQzZ2Dp9L8/8MSeh Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4satwACgkQC3+MBN1Mb4iAJgCfXlGHKhvX+8OHOcvJZsGF/Wqn PHMAoKrjfb0EBUZuGCRwSN7vvrLciS8B =LZvF -----END PGP SIGNATURE----- --VQzZ2Dp9L8/8MSeh-- From owner-freebsd-fs@FreeBSD.ORG Sun Jul 24 22:22:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAF99106566C for ; Sun, 24 Jul 2011 22:22:10 +0000 (UTC) (envelope-from gcorcoran@rcn.com) Received: from ex-vmail02.lnh.mail.rcn.net (vmail02.lnh.mail.rcn.net [207.172.157.112]) by mx1.freebsd.org (Postfix) with ESMTP id 886138FC13 for ; Sun, 24 Jul 2011 22:22:10 +0000 (UTC) Received: from mr16.lnh.mail.rcn.net ([207.172.157.36]) by smtp02.lnh.mail.rcn.net with ESMTP; 24 Jul 2011 17:53:17 -0400 Received: from smtp04.lnh.mail.rcn.net (smtp04.lnh.mail.rcn.net [207.172.157.104]) by mr16.lnh.mail.rcn.net (MOS 4.2.3-GA) with ESMTP id BFB63136; Sun, 24 Jul 2011 17:53:16 -0400 X-Auth-ID: gcorcoran Received: from 64-121-74-167.c3-0.tlg-ubr2.atw-tlg.pa.cable.rcn.com (HELO [10.56.78.179]) ([64.121.74.167]) by smtp04.lnh.mail.rcn.net with ESMTP; 24 Jul 2011 17:53:16 -0400 Message-ID: <4E2C9419.4000205@rcn.com> Date: Sun, 24 Jul 2011 17:52:25 -0400 From: Gary Corcoran User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Junkmail-Whitelist: YES (by domain whitelist at mr16.lnh.mail.rcn.net) Subject: 3TB drives on ZFS and booting X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Jul 2011 22:22:10 -0000 I have seen conflicting information on the internet about this, and so I would like a direct answer from someone who knows for sure. Does FreeBSD's ZFS work with 3TB drives, and is it possible to do a ZFS-only (i.e. boot from ZFS) installation with 3TB drives on FreeBSD? I presume that since ZFS was designed to handle huge filesystems, it would have no problem with 3TB drives, but I guess the real question is the ZFS boot code - can it currently handle >2TB drives? Bottom line: would I be able to successfully build (and of course boot) a FreeBSD ZFS-only system using only 3TB drives? Thanks, Gary From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 05:42:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E201E1065672 for ; Mon, 25 Jul 2011 05:42:50 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id EE3ED8FC0C for ; Mon, 25 Jul 2011 05:42:49 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 304F49478D1; Mon, 25 Jul 2011 07:42:48 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 14.1348] X-CRM114-CacheID: sfid-20110725_07424_61E03616 X-CRM114-Status: Good ( pR: 14.1348 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Mon Jul 25 07:42:48 2011 X-DSPAM-Confidence: 0.9957 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4e2d025823651866976671 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, >+I, 0.00102, >+I, 0.00102, >+On, 0.00111, com>+wrote, 0.00176, wrote+>, 0.00197, wrote+>>, 0.00315, >+>, 0.00321, >+>, 0.00321, References*mail.gmail.com>, 0.00348, References*mail.gmail.com>, 0.00348, org>+wrote, 0.00352, this+>, 0.00460, On+Thu, 0.00460, >>+>>, 0.00478, >>+>>, 0.00478, wrote, 0.00481, wrote, 0.00481, with+>, 0.00543, In-Reply-To*mail.gmail.com>, 0.00543, files+and, 0.00597, default, 0.00663, for+>, 0.00663, files, 0.00808, files, 0.00808, Is+there, 0.00851, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id 69CD69478C4; Mon, 25 Jul 2011 07:42:47 +0200 (CEST) Message-ID: <4E2D0257.8080608@fsn.hu> Date: Mon, 25 Jul 2011 07:42:47 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Ivan Voras References: In-Reply-To: X-Stationery: 0.7.1 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and large directories - caveat report X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 05:42:51 -0000 On 07/21/11 18:38, Ivan Voras wrote: > On 21 July 2011 17:50, Freddie Cash wrote: >> On Thu, Jul 21, 2011 at 8:45 AM, Ivan Voras wrote: >>> Is there an equivalent of UFS dirhash memory setting for ZFS? (i.e. the >>> size of the metadata cache) >> vfs.zfs.arc_meta_limit >> >> This sets the amount of ARC that can be used for metadata. The default is >> 1/8th of ARC, I believe. This setting lets you use "primarycache=all" >> (store metadata and file data in ARC) but then tune how much is used for >> each. >> >> Not sure if that will help in your case or not, but it's a sysctl you can >> play with. > I don't think that it works, or at least is not as efficient as dirhash: > > www:~> sysctl -a | grep meta > kern.metadelay: 28 > vfs.zfs.mfu_ghost_metadata_lsize: 129082368 > vfs.zfs.mfu_metadata_lsize: 116224 > vfs.zfs.mru_ghost_metadata_lsize: 113958912 > vfs.zfs.mru_metadata_lsize: 16384 > vfs.zfs.anon_metadata_lsize: 0 > vfs.zfs.arc_meta_limit: 322412800 > vfs.zfs.arc_meta_used: 506907792 > kstat.zfs.misc.arcstats.demand_metadata_hits: 4471705 > kstat.zfs.misc.arcstats.demand_metadata_misses: 2110328 > kstat.zfs.misc.arcstats.prefetch_metadata_hits: 27 > kstat.zfs.misc.arcstats.prefetch_metadata_misses: 51 > > arc_meta_used is nearly 500 MB which should be enough even in this > case. With filenames of 32 characters, all the filenames alone for > 130,000 files in a directory take about 4 MB - I doubt the ZFS > introduces so much extra metadata it doesn't fit in 500 MB. > > I am now deleting the session files, and I hope it will not take days > to complete... > Worse than that, I've seen a similar issue, hashed directories with about 1M+ files. After deleting all those files, even a find on the empty directories took ages... From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 11:07:06 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E9F4106564A for ; Mon, 25 Jul 2011 11:07:06 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5CD058FC19 for ; Mon, 25 Jul 2011 11:07:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6PB769n046361 for ; Mon, 25 Jul 2011 11:07:06 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6PB75Dj046359 for freebsd-fs@FreeBSD.org; Mon, 25 Jul 2011 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 25 Jul 2011 11:07:05 GMT Message-Id: <201107251107.p6PB75Dj046359@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 11:07:06 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs [amd] amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157728 fs [zfs] zfs (v28) incremental receive may leave behind t o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156933 fs [zfs] ZFS receive after read on readonly=on filesystem o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs f kern/130133 fs [panic] [zfs] 'kmem_map too small' caused by make clea o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi f kern/126703 fs [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/120210 fs [zfs] [panic] reboot after panic: solaris assert: arc_ o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 237 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 16:30:24 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 786F2106566B for ; Mon, 25 Jul 2011 16:30:24 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 4E12F8FC0C for ; Mon, 25 Jul 2011 16:30:24 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6PGUOeM049882 for ; Mon, 25 Jul 2011 16:30:24 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6PGUOV6049876; Mon, 25 Jul 2011 16:30:24 GMT (envelope-from gnats) Date: Mon, 25 Jul 2011 16:30:24 GMT Message-Id: <201107251630.p6PGUOV6049876@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Gary Palmer Cc: Subject: Re: kern/159077: Can't cd .. with latest zfs version X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Gary Palmer List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 16:30:24 -0000 The following reply was made to PR kern/159077; it has been noted by GNATS. From: Gary Palmer To: Michael Haro Cc: FreeBSD-gnats-submit@FreeBSD.org Subject: Re: kern/159077: Can't cd .. with latest zfs version Date: Mon, 25 Jul 2011 12:25:34 -0400 On Wed, Jul 20, 2011 at 11:37:21PM -0700, Michael Haro wrote: > > >Number: 159077 > >Category: kern > >Synopsis: Can't cd .. with latest zfs version > >Confidential: no > >Severity: serious > >Priority: medium > >Responsible: freebsd-bugs > >State: open > >Quarter: > >Keywords: > >Date-Required: > >Class: sw-bug > >Submitter-Id: current-users > >Arrival-Date: Thu Jul 21 07:10:07 UTC 2011 > >Closed-Date: > >Last-Modified: > >Originator: Michael Haro > >Release: FreeBSD 8.2-STABLE amd64 > >Organization: > >Environment: > System: FreeBSD backups.mtv.bitsurf.net 8.2-STABLE FreeBSD 8.2-STABLE #1: Sat Jul 16 19:26:28 PDT 2011 root@backups.mtv.bitsurf.net:/usr/obj/usr/src/sys/KERNEL amd64 > > > freebsd 8.2 stable as of july 16th > zpool version 28 > zfs version 3 > > >Description: > > trying to cd up one level using 'cd ..' gives permission denied > > >How-To-Repeat: > > use sh or tcsh, not bash... > > $ pwd > /home/mharo > $ cd .. > cd: can't cd to .. > $ ls -ald /home > drwxr-xr-x 4 root wheel 4 Nov 29 2009 /home > $ ls -ald /home/mharo > drwxr-xr-x 3 mharo users 15 Jul 20 22:49 /home/mharo > $ cd /home > $ pwd > /home > $ ls -ald mharo > drwxr-xr-x 3 mharo users 15 Jul 20 22:49 mharo > $ cd mharo > $ cd .. > cd: can't cd to .. > > so obviously I can cd into /home, just not via .. > > $ zfs list -r zroot/home > NAME USED AVAIL REFER MOUNTPOINT > zroot/home 162K 2.70G 26K /home > zroot/home/mharo 119K 2.70G 35.5K /home/mharo It may be worth unmounting /home/mharo and checking the permissions of the directory underneath the mount point. e.g. % mkdir /tmp/159077 % chmod 0 /tmp/159077 % ls -la /tmp/159077 total 274 d--------- 2 root wheel 512 Jul 25 17:22 . drwxrwxrwt 57 root wheel 249344 Jul 25 17:22 .. % mount /dev/md0 /tmp/159077 % ls -la /tmp/159077 total 276 drwxr-xr-x 3 root wheel 512 Jul 25 17:21 . drwxrwxrwt 57 root wheel 249344 Jul 25 17:22 .. drwxrwxr-x 2 root operator 512 Jul 25 17:21 .snap % and then as a regular user: $ cd /tmp/159077/ $ pwd /tmp/159077 $ ls -la ls: ..: Permission denied total 4 drwxr-xr-x 3 root wheel 512 Jul 25 17:21 . drwxrwxr-x 2 root operator 512 Jul 25 17:21 .snap $ ls -la .. ls: ..: Permission denied $ cd .. cd: can't cd to .. $ Regards Gary From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 17:01:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E89E11065672 for ; Mon, 25 Jul 2011 17:01:33 +0000 (UTC) (envelope-from clinton.adams@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 78EEA8FC08 for ; Mon, 25 Jul 2011 17:01:33 +0000 (UTC) Received: by ewy1 with SMTP id 1so3594315ewy.13 for ; Mon, 25 Jul 2011 10:01:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=QUrFy+eef4UVMZfHa2r2LL32e6vxO+6Tc4GJy3tuWjQ=; b=A9oIeL4fUSB4v97kG+/WEI1j8dCJXKw8l9DWFb8xOJTujdiDlBtVEm449/Rs9m1iyU EaBLStD8ff+JA29RV+zlHhAvRxqnH5mORLwVxKKoZ5quZ2YsLXe7dtKD3REFla67uIfr a196/J8qWFMB2Bwyjb8mr50zgq88fwXOkYYqI= MIME-Version: 1.0 Received: by 10.14.47.200 with SMTP id t48mr1727898eeb.147.1311613292164; Mon, 25 Jul 2011 10:01:32 -0700 (PDT) Received: by 10.14.22.76 with HTTP; Mon, 25 Jul 2011 10:01:32 -0700 (PDT) In-Reply-To: <1730895125.912894.1311373009726.JavaMail.root@erie.cs.uoguelph.ca> References: <1730895125.912894.1311373009726.JavaMail.root@erie.cs.uoguelph.ca> Date: Mon, 25 Jul 2011 19:01:32 +0200 Message-ID: From: Clinton Adams To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD FS Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 17:01:34 -0000 On Sat, Jul 23, 2011 at 12:16 AM, Rick Macklem wrote= : > Clinton Adams wrote: > [stuff snipped for brevity] >> >> Running four clients now and the LockOwners are steadily climbing, >> nfsstat consistently reported it as 0 prior to users logging into the >> nfsv4 test systems - my testing via ssh didn't show anything like >> this. Attached tcpdump file is from when I first noticed the jump in >> LockOwners from 0 to ~600. I tried wireshark on this and didn't see >> any releaselockowner operations. >> > [stuff snipped for brevity] >> OpenOwner Opens LockOwner Locks Delegs >> 6 242 2481 22 0 >> Server Cache Stats: >> Inprog Idem Non-idem Misses CacheSize TCPPeak >> 0 0 2 2518251 2502 4772 >> > I've written a small test program: > =A0http://people.freebsd.org/~rmacklem/childlock.c (also attached) > > where a parent process opens a file and then forks children that do > lock ops and then exit. (I'm guessing that this is what some process > in your clients are doing, that result in the LockOwner count growing.) > > When I run this program on Fedora15, it generates ReleaseLockOwner Ops > and the LockOwner count doesn't increase as it runs. > > You can run this program by giving it an argument that can be any file > on the nfsv4 mount for which you have read/write access, then watch > the server via "nfsstat -e -s" to see if the LockOwner count increases. > > If the LockOwner count does increase, then it appears that a newer Linux > kernel will avoid the problem. Yes, a client running a newer kernel (2.6.38) does generate the release_lockowner ops. Thanks for all the help! > > If you are interested in what the packet trace looks like when running th= e > program on Fedora15, it's at: > =A0http://people.freebsd.org/~rmacklem/childlock.pcap > > rick > ps: The FreeBSD NFSv4 client doesn't currently generate the ReleaseLockOw= ner > =A0 =A0Ops for this case either. I need to come up with a patch that does= that. > > From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 17:30:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C3691065673 for ; Mon, 25 Jul 2011 17:30:05 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E9CF78FC17 for ; Mon, 25 Jul 2011 17:30:04 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 8B27246B06; Mon, 25 Jul 2011 13:30:04 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1E1548A02C; Mon, 25 Jul 2011 13:30:04 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Mon, 25 Jul 2011 09:39:31 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E2C9419.4000205@rcn.com> In-Reply-To: <4E2C9419.4000205@rcn.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201107250939.31746.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 25 Jul 2011 13:30:04 -0400 (EDT) Cc: Subject: Re: 3TB drives on ZFS and booting X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 17:30:05 -0000 On Sunday, July 24, 2011 5:52:25 pm Gary Corcoran wrote: > I have seen conflicting information on the internet about this, and so > I would like a direct answer from someone who knows for sure. Does FreeBSD's > ZFS work with 3TB drives, and is it possible to do a ZFS-only (i.e. boot from > ZFS) installation with 3TB drives on FreeBSD? I presume that since ZFS was designed > to handle huge filesystems, it would have no problem with 3TB drives, but I guess > the real question is the ZFS boot code - can it currently handle >2TB drives? > Bottom line: would I be able to successfully build (and of course boot) a FreeBSD > ZFS-only system using only 3TB drives? You probably want to use GPT instead of MBR, but the GPT ZFS boot code shoul fully handle 64-bit LBAs just fine. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 21:58:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9EA0106566C for ; Mon, 25 Jul 2011 21:58:36 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 949228FC14 for ; Mon, 25 Jul 2011 21:58:36 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EADPmLU6DaFvO/2dsb2JhbAA0AQEEASlPDQUYGAICDSUCFlEHhG2jfIh8r2qRFoErhAWBDwSScIgxiEs X-IronPort-AV: E=Sophos;i="4.67,265,1309752000"; d="scan'208";a="128529439" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Jul 2011 17:58:35 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ECC5DB3F3C; Mon, 25 Jul 2011 17:58:35 -0400 (EDT) Date: Mon, 25 Jul 2011 17:58:35 -0400 (EDT) From: Rick Macklem To: Zack Kirsch Message-ID: <957583241.989932.1311631115955.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <476FC2247D6C7843A4814ED64344560C04443EAA@seaxch10.desktop.isilon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: nfsd server cache flooded, try to increase nfsrc_floodlevel X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 21:58:36 -0000 Zack Kirsch wrote: > Just wanted to add a bit of Isilon color. We've hit this limit before, > but I believe it was mostly due to strange client behavior of 1) Using > a new lockowner for each lock and 2) Using a new TCP connection for > each 'test run'. When I saw this before, I remarked that this shouldn't be relevant. I realize now that you were referring to a test environment (not a real NFS client) where it keeps creating new TCP connections, even if the previous connection wasn't broken due to a network partitioning or similar. Sorry about that. > As far as I know, we haven't hit this in the field. > It appears that this case was a result of use of an old Linux NFSv4 client and was resolved via a kernel upgrade. (ie. I suspect there are others out there that will run into the same thing sooner or later.) > We've done a few things to combat this problem: > 1) We increased the floodlevel to 65536. > 2) We made the floodlevel configurable via sysctl. > 3) We made significant changes to the replay cache itself. Specific > gains were drastic performance improvements and freeing of cache > entries from stale TCP connections. > It is important to note that the request cache holds onto replies for inactive TCP connections because it assumes that the client might be network partitioned for long enough that it is forced to reconnect using a fresh TCP connection and will then retry all outstanding RPCs. This could take a looonnngggg time to happen, so these replies can't be free'd quickly, or the whole purpose of the cache (avoiding redoing non-idempotent operations when an RPC is retried) is defeated. The fact that some artificial test program (pynfs maybe?) chooses to do fresh TCP connections isn't relevant imho, since it isn't a real client and, as far as I know, real clients only reconnect when the old TCP connection no longer works. I thought I'd try and clarify this for anyone interested, rick From owner-freebsd-fs@FreeBSD.ORG Mon Jul 25 23:22:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08E73106566B for ; Mon, 25 Jul 2011 23:22:42 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id BA2508FC0C for ; Mon, 25 Jul 2011 23:22:41 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EABb6LU6DaFvO/2dsb2JhbABkgRMCDQceAhZYhG2jfKo6jkaRH4ErgXuCCoEPBJJyiDGISw X-IronPort-AV: E=Sophos;i="4.67,265,1309752000"; d="scan'208";a="128537753" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 25 Jul 2011 19:22:40 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BBA73B3F9B for ; Mon, 25 Jul 2011 19:22:40 -0400 (EDT) Date: Mon, 25 Jul 2011 19:22:40 -0400 (EDT) From: Rick Macklem To: FreeBSD FS Message-ID: <2086374310.991475.1311636160720.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Subject: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2011 23:22:42 -0000 Hi, Currently both NFS servers set the vnode lock LK_SHARED and so do the local syscalls (at least that's how it looks by inspection?). Peter Holm just posted me this panic, where a test for an exclusive vnode lock fails in msdosfs_readdir(). KDB: stack backtrace: db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at kdb_backtrace+0x2a vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23 assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at assert_vop_elocked+0x55 pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45 msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at msdosfs_readdir+0x528 VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at VOP_READDIR_APV+0xc5 nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at nfsrvd_readdir+0x38e nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at nfsrvd_dorpc+0x1f79 nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at nfssvc_program+0x40f svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) at svc_run_internal+0x952 svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at svc_thread_start+0x10 fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 fork_trampoline() at fork_trampoline+0x8 --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 --- pcbmap: 0xc7f20ae0 is not exclusive locked but should be KDB: enter: lock violation So, does anyone know if the msdosfs_readdir() really requires a LK_EXCLUSIVE locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()? rick From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 08:30:20 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EAEE01065674; Tue, 26 Jul 2011 08:30:20 +0000 (UTC) (envelope-from dim@FreeBSD.org) Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net [IPv6:2001:7b8:2ff:146::2]) by mx1.freebsd.org (Postfix) with ESMTP id AF8118FC1A; Tue, 26 Jul 2011 08:30:20 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7:0:8c40:e6eb:d519:2a58] (unknown [IPv6:2001:7b8:3a7:0:8c40:e6eb:d519:2a58]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 7FA945C37; Tue, 26 Jul 2011 10:30:19 +0200 (CEST) Message-ID: <4E2E7B1B.2020906@FreeBSD.org> Date: Tue, 26 Jul 2011 10:30:19 +0200 From: Dimitry Andric Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: John Baldwin References: <4E2C9419.4000205@rcn.com> <201107250939.31746.jhb@freebsd.org> In-Reply-To: <201107250939.31746.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: 3TB drives on ZFS and booting X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 08:30:21 -0000 On 2011-07-25 15:39, John Baldwin wrote: > On Sunday, July 24, 2011 5:52:25 pm Gary Corcoran wrote: ... >> Bottom line: would I be able to successfully build (and of course boot) a FreeBSD >> ZFS-only system using only 3TB drives? > You probably want to use GPT instead of MBR, but the GPT ZFS boot code shoul > fully handle 64-bit LBAs just fine. Isn't that also dependent on the BIOS's ability to handle 64-bit LBA's? From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 09:04:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 939E5106566B for ; Tue, 26 Jul 2011 09:04:46 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 2F1378FC14 for ; Tue, 26 Jul 2011 09:04:45 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6Q94g8r008125 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 26 Jul 2011 12:04:42 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p6Q94fUS017456; Tue, 26 Jul 2011 12:04:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6Q94fx7017455; Tue, 26 Jul 2011 12:04:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 26 Jul 2011 12:04:41 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110726090441.GD17489@deviant.kiev.zoral.com.ua> References: <2086374310.991475.1311636160720.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6iAIcqJi9p/aaN8Y" Content-Disposition: inline In-Reply-To: <2086374310.991475.1311636160720.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FreeBSD FS Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 09:04:46 -0000 --6iAIcqJi9p/aaN8Y Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote: > Hi, >=20 > Currently both NFS servers set the vnode lock LK_SHARED > and so do the local syscalls (at least that's how it looks > by inspection?). >=20 > Peter Holm just posted me this panic, where a test for an > exclusive vnode lock fails in msdosfs_readdir(). > KDB: stack backtrace: > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) a= t db_trace_self_wrapper+0x26 > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at kdb_ba= cktrace+0x2a > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23 > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at assert_vop_eloc= ked+0x55 > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45 > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at msdosfs_rea= ddir+0x528 > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at VOP_READDIR= _APV+0xc5 > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at nfsrvd_readd= ir+0x38e > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at nfsrvd_dorpc+0x1f79 > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at nfssvc_program+0x= 40f > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) at svc= _run_internal+0x952 > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at svc_thre= ad_start+0x10 > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x804c12e, eip =3D 0xc, esp =3D 0x33, ebp =3D 0x1 --- > pcbmap: 0xc7f20ae0 is not exclusive locked but should be > KDB: enter: lock violation >=20 > So, does anyone know if the msdosfs_readdir() really requires a LK_EXCLUS= IVE > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()? Yes, msdosfs currently requires all vnode locks to be exclusive. One of the reasons is that each denode (the msdosfs-private vnode data) carries the fat entries cache, and this cache is updated even by the operations that do not modify vnode from the VFS POV. The locking regime is enforced by the getnewvnode() initializing the vnode lock with LK_NOSHARE flag, and msdosfs code not calling VN_LOCK_ASHARE() on the newly instantiated vnode. My question is, was the vnode in question locked at all ? --6iAIcqJi9p/aaN8Y Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4ugykACgkQC3+MBN1Mb4hh2ACfS72MfHc6jb7XUh7FsaqkV8py 0lsAn1QwwRgW1mdqjxD5ACBsWz35fci2 =/7qP -----END PGP SIGNATURE----- --6iAIcqJi9p/aaN8Y-- From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 13:22:35 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9862D1065670; Tue, 26 Jul 2011 13:22:35 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 70F398FC0C; Tue, 26 Jul 2011 13:22:35 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 11F2946B62; Tue, 26 Jul 2011 09:22:35 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 9DE8C8A02C; Tue, 26 Jul 2011 09:22:34 -0400 (EDT) From: John Baldwin To: Dimitry Andric Date: Tue, 26 Jul 2011 09:03:58 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <4E2C9419.4000205@rcn.com> <201107250939.31746.jhb@freebsd.org> <4E2E7B1B.2020906@FreeBSD.org> In-Reply-To: <4E2E7B1B.2020906@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201107260903.58265.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 26 Jul 2011 09:22:34 -0400 (EDT) Cc: freebsd-fs@freebsd.org Subject: Re: 3TB drives on ZFS and booting X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 13:22:35 -0000 On Tuesday, July 26, 2011 4:30:19 am Dimitry Andric wrote: > On 2011-07-25 15:39, John Baldwin wrote: > > On Sunday, July 24, 2011 5:52:25 pm Gary Corcoran wrote: > ... > >> Bottom line: would I be able to successfully build (and of course boot) a FreeBSD > >> ZFS-only system using only 3TB drives? > > You probably want to use GPT instead of MBR, but the GPT ZFS boot code shoul > > fully handle 64-bit LBAs just fine. > > Isn't that also dependent on the BIOS's ability to handle 64-bit LBA's? Yes, but the original EDD 1.0 spec that included the 'packet' and extended INT 13h functions included 64-bit LBAs, so at this point I would expect most BIOSes to support that. Also, only BIOSes for controllers that support logical disks > 2TB (either RAID volumes or large physical disks) have to actually support having the upper 32-bits be non-zero. I strongly suspect that that is in fact true. That is, if you have a controller new enough to support a 3 TB drive, it's accompanying BIOS ROM should support 64-bit LBAs. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 13:27:03 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D3A51065670 for ; Tue, 26 Jul 2011 13:27:03 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 1BD298FC16 for ; Tue, 26 Jul 2011 13:27:03 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Qlhf3-0003b0-26 for freebsd-fs@freebsd.org; Tue, 26 Jul 2011 15:27:01 +0200 Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 26 Jul 2011 15:27:01 +0200 Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 26 Jul 2011 15:27:01 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Date: Tue, 26 Jul 2011 14:26:48 +0100 Lines: 39 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 Subject: panic: snapacct_ufs2: bad block X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 13:27:03 -0000 Hi! Just got a panic on my 8-stable box: panic: snapacct_ufs2: bad block cpuid = 0 KDB: stack backtrace: #0 0xffffffff805fd350 at kdb_backtrace+0x60 #1 0xffffffff805cb194 at panic+0x1b4 #2 0xffffffff807eed0e at snapacct_ufs2+0xfe #3 0xffffffff807ee55f at indiracct_ufs2+0x2ff #4 0xffffffff807ee4f7 at indiracct_ufs2+0x297 #5 0xffffffff807ef0ce at expunge_ufs2+0x30e #6 0xffffffff807f2f79 at ffs_snapshot+0x1e59 #7 0xffffffff80802e38 at ffs_mount+0x1628 #8 0xffffffff80654f1c at vfs_donmount+0xf9c #9 0xffffffff80655903 at nmount+0x73 #10 0xffffffff8060a17e at syscallenter+0x2fe #11 0xffffffff808bcd31 at syscall+0x41 #12 0xffffffff808a52f2 at Xfast_syscall+0xe2 This box is running: FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #0 r224227: Wed Jul 20 16:55:23 BST 2011 root@XXX:/usr/obj/usr/src/sys/GENERIC amd64 The crash happened during automated backup with dump(8) on the root file system. There was plenty of free space left. I have deleted all remaining snapshot files now. Either dump or savecore didnt work, so that's the only info i have (interestingly the above backtrace ended up in the logs anyway). This panic has happened before (twice or so) with older versions of fbsd. Would it be prudent to newfs and restore from backup, just to make sure there are no remaining glitches? Johannes From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 14:07:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9111106566C for ; Tue, 26 Jul 2011 14:07:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id A3F218FC0C for ; Tue, 26 Jul 2011 14:07:29 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAILJLk6DaFvO/2dsb2JhbAA1AQEFKQRGEh0YAgINBx4CFlEHhG2jfrkTkUCBK4F7gguBDwSScogxiEs X-IronPort-AV: E=Sophos;i="4.67,269,1309752000"; d="scan'208";a="132289229" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 26 Jul 2011 10:07:28 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C2301B3F08; Tue, 26 Jul 2011 10:07:28 -0400 (EDT) Date: Tue, 26 Jul 2011 10:07:28 -0400 (EDT) From: Rick Macklem To: Kostik Belousov Message-ID: <429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110726090441.GD17489@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: FreeBSD FS Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 14:07:30 -0000 Kostik Belousov wrote: > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote: > > Hi, > > > > Currently both NFS servers set the vnode lock LK_SHARED > > and so do the local syscalls (at least that's how it looks > > by inspection?). > > > > Peter Holm just posted me this panic, where a test for an > > exclusive vnode lock fails in msdosfs_readdir(). > > KDB: stack backtrace: > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) > > at db_trace_self_wrapper+0x26 > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at > > kdb_backtrace+0x2a > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23 > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at > > assert_vop_elocked+0x55 > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45 > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at > > msdosfs_readdir+0x528 > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at > > VOP_READDIR_APV+0xc5 > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at > > nfsrvd_readdir+0x38e > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at > > nfsrvd_dorpc+0x1f79 > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at > > nfssvc_program+0x40f > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) > > at svc_run_internal+0x952 > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at > > svc_thread_start+0x10 > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 > > fork_trampoline() at fork_trampoline+0x8 > > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 --- > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be > > KDB: enter: lock violation > > > > So, does anyone know if the msdosfs_readdir() really requires a > > LK_EXCLUSIVE > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()? > > Yes, msdosfs currently requires all vnode locks to be exclusive. One > of > the reasons is that each denode (the msdosfs-private vnode data) > carries > the fat entries cache, and this cache is updated even by the > operations > that do not modify vnode from the VFS POV. > > The locking regime is enforced by the getnewvnode() initializing the > vnode > lock with LK_NOSHARE flag, and msdosfs code not calling > VN_LOCK_ASHARE() > on the newly instantiated vnode. > > My question is, was the vnode in question locked at all ? I think the problem is that I do a LK_DOWNGRADE. From a quick look at __lockmgr_args(), it doesn't check LK_NOSHARE for a LK_DOWNGRADE. Maybe __lockmgr_args() should have something like: if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags & LK_NOSHARE)) return (0); /* noop */ after the if (op == LK_SHARED && (lk->lock_object.lo_flags & LK_NOSHARE)) op = LK_EXCLUSIVE; lines? Anyhow, I'll get pho@ to test a patch without the LK_DOWNGRADE in it. (It was pretty useless and would go away soon anyhow, once the lkflags argument to VFS_FHTOVP() gets used.) Thanks for the info, rick From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 14:22:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 809B9106564A; Tue, 26 Jul 2011 14:22:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 1EABA8FC19; Tue, 26 Jul 2011 14:21:59 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p6QELuVH036984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 26 Jul 2011 17:21:56 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p6QELubY077960; Tue, 26 Jul 2011 17:21:56 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p6QELuP2077959; Tue, 26 Jul 2011 17:21:56 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 26 Jul 2011 17:21:56 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110726142156.GJ17489@deviant.kiev.zoral.com.ua> References: <20110726090441.GD17489@deviant.kiev.zoral.com.ua> <429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="b4Xfh4GKY2byHbNw" Content-Disposition: inline In-Reply-To: <429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FreeBSD FS , attilio@freebsd.org Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 14:22:00 -0000 --b4Xfh4GKY2byHbNw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote: > Kostik Belousov wrote: > > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote: > > > Hi, > > > > > > Currently both NFS servers set the vnode lock LK_SHARED > > > and so do the local syscalls (at least that's how it looks > > > by inspection?). > > > > > > Peter Holm just posted me this panic, where a test for an > > > exclusive vnode lock fails in msdosfs_readdir(). > > > KDB: stack backtrace: > > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,..= .) > > > at db_trace_self_wrapper+0x26 > > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at > > > kdb_backtrace+0x2a > > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23 > > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at > > > assert_vop_elocked+0x55 > > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45 > > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at > > > msdosfs_readdir+0x528 > > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at > > > VOP_READDIR_APV+0xc5 > > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at > > > nfsrvd_readdir+0x38e > > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at > > > nfsrvd_dorpc+0x1f79 > > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at > > > nfssvc_program+0x40f > > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) > > > at svc_run_internal+0x952 > > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at > > > svc_thread_start+0x10 > > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 > > > fork_trampoline() at fork_trampoline+0x8 > > > --- trap 0x804c12e, eip =3D 0xc, esp =3D 0x33, ebp =3D 0x1 --- > > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be > > > KDB: enter: lock violation > > > > > > So, does anyone know if the msdosfs_readdir() really requires a > > > LK_EXCLUSIVE > > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()? > >=20 > > Yes, msdosfs currently requires all vnode locks to be exclusive. One > > of > > the reasons is that each denode (the msdosfs-private vnode data) > > carries > > the fat entries cache, and this cache is updated even by the > > operations > > that do not modify vnode from the VFS POV. > >=20 > > The locking regime is enforced by the getnewvnode() initializing the > > vnode > > lock with LK_NOSHARE flag, and msdosfs code not calling > > VN_LOCK_ASHARE() > > on the newly instantiated vnode. > >=20 > > My question is, was the vnode in question locked at all ? > I think the problem is that I do a LK_DOWNGRADE. From a quick > look at __lockmgr_args(), it doesn't check LK_NOSHARE for a > LK_DOWNGRADE. >=20 > Maybe __lockmgr_args() should have something like: > if (op =3D=3D LK_DOWNGRADE && (lk->lock_object.lo_flags & LK_NOSHARE)) > return (0); /* noop */ > after the > if (op =3D=3D LK_SHARED && (lk->lock_object.lo_flags & LK_NOSHARE)) > op =3D LK_EXCLUSIVE; > lines? The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade, but I agree with the essence of your proposal. >=20 > Anyhow, I'll get pho@ to test a patch without the LK_DOWNGRADE in > it. (It was pretty useless and would go away soon anyhow, once the > lkflags argument to VFS_FHTOVP() gets used.) >=20 > Thanks for the info, rick --b4Xfh4GKY2byHbNw Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4uzYQACgkQC3+MBN1Mb4jUqwCfd0psq10eFKVOBjT6Ih4XKH55 THAAoPXF5vfaXy/LPtnjRmSK9i2d4IdK =TV5Y -----END PGP SIGNATURE----- --b4Xfh4GKY2byHbNw-- From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 20:07:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DDE5106564A for ; Tue, 26 Jul 2011 20:07:28 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id D27F48FC15 for ; Tue, 26 Jul 2011 20:07:27 +0000 (UTC) Received: by gyf3 with SMTP id 3so703080gyf.13 for ; Tue, 26 Jul 2011 13:07:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=E3mXvGBf5f3u1wP19CHgp8dr/Jaf1PM2VUBGAGQRwjw=; b=hi8Xz1t+tBb1oGLoWPSSG6tDlX6IZYIgcliXLsM+gzfipVFaO2RCmiOQ0K09sFggMA K7NHsp19Am4puuzYGZMa2pU7Q/VFieAhb1wNXVKox9D1uk3SzQLpUYS5zLcGlAKcPnkB zI8rKyRq4cFmkkFujWCDUcjlbiKmIFQ+4g4EI= MIME-Version: 1.0 Received: by 10.236.137.140 with SMTP id y12mr7253226yhi.191.1311710846912; Tue, 26 Jul 2011 13:07:26 -0700 (PDT) Received: by 10.236.103.15 with HTTP; Tue, 26 Jul 2011 13:07:26 -0700 (PDT) In-Reply-To: <4E2C9419.4000205@rcn.com> References: <4E2C9419.4000205@rcn.com> Date: Tue, 26 Jul 2011 21:07:26 +0100 Message-ID: From: krad To: Gary Corcoran Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: 3TB drives on ZFS and booting X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 20:07:28 -0000 On 24 July 2011 22:52, Gary Corcoran wrote: > I have seen conflicting information on the internet about this, and so > I would like a direct answer from someone who knows for sure. Does > FreeBSD's > ZFS work with 3TB drives, and is it possible to do a ZFS-only (i.e. boot > from > ZFS) installation with 3TB drives on FreeBSD? I presume that since ZFS was > designed > to handle huge filesystems, it would have no problem with 3TB drives, but I > guess > the real question is the ZFS boot code - can it currently handle >2TB > drives? > Bottom line: would I be able to successfully build (and of course boot) a > FreeBSD > ZFS-only system using only 3TB drives? > > Thanks, > Gary > > ______________________________**_________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@**freebsd.org > " > The main issue with 3tb drives will be the 4k sector size. This applies to most of the 2tb drives as well. Make sure you use GPT layout and align it. IE make sure each gpt partiion start sector is / by 8 and its size is. Here is mine for an example $ gpart show ada0 => 34 3907029101 ada0 GPT (1.8T) 34 6 - free - (3.0k) 40 128 1 freebsd-boot (64k) 168 6291456 2 freebsd-swap (3.0G) 6291624 3900213229 3 freebsd-zfs (1.8T) also make sure you use gpt boot blocks that can cope with 4k aligned drives. Use the ones from current or these binary ones http://people.freebsd.org/~pjd/zfsboot/ From owner-freebsd-fs@FreeBSD.ORG Tue Jul 26 20:18:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 025B1106566B for ; Tue, 26 Jul 2011 20:18:56 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1-6.sentex.ca [IPv6:2607:f3e0:0:1::12]) by mx1.freebsd.org (Postfix) with ESMTP id BCD3B8FC0A for ; Tue, 26 Jul 2011 20:18:55 +0000 (UTC) Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a]) by smarthost1.sentex.ca (8.14.4/8.14.4) with ESMTP id p6QKIsCI081218 for ; Tue, 26 Jul 2011 16:18:54 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <4E2F2122.6080204@sentex.net> Date: Tue, 26 Jul 2011 16:18:42 -0400 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: freebsd-fs@freebsd.org X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.71 on IPv6:2607:f3e0:0:1::12 Subject: zfs error - snapshot: Bad file descriptor X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Jul 2011 20:18:56 -0000 I googled around for an answer to this, but other than reboot, I never found any other strategies. On my backup server (RELENG_8 from Jun 20th, AMD64 8G of RAM), I have one big pool # zpool status -v pool: zbackup1 state: ONLINE scan: scrub repaired 0 in 11h11m with 0 errors on Mon Jul 25 19:51:11 2011 config: NAME STATE READ WRITE CKSUM zbackup1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada7 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada6 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 ada0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 errors: No known data errors and a number of file systems zbackup1 5241690240 2248788502 2992901738 43% /zbackup1 zbackup1/archive 2992901771 33 2992901738 0% /zbackup1/archive zbackup1/cust1 3254254853 261353115 2992901738 8% /zbackup1/cust1 When I would change to /zbackup1/cust1/.zfs and do a ls -l # ls -l ls: snapshot: Bad file descriptor total 4 dr-xr-xr-x 4 root wheel - 4 Mar 4 08:43 . drwxr-xr-x 21 root wheel - 21 Jun 29 11:46 .. dr-xr-xr-x 2 root wheel - 2 Mar 4 08:43 shares snapshot was set to visible zbackup1/cust1 snapdir visible inherited from zbackup1 And I could even list them in zfs get all zbackup1/cust1@20110715 type snapshot - zbackup1/cust1@20110715 creation Fri Jul 15 8:10 2011 - zbackup1/cust1@20110715 used 7.41G - But I could never change to the directory and do an ls -l, let along get files I ran a full scrub, but it did not help. I did a reboot and all worked after that. # ls -l total 4 dr-xr-xr-x 4 root wheel - 4 Mar 4 08:43 . drwxr-xr-x 21 root wheel - 21 Jun 29 11:46 .. dr-xr-xr-x 2 root wheel - 2 Mar 4 08:43 shares dr-xr-xr-x 5 root wheel - 5 Jul 26 12:11 snapshot # cd snapshot/ # ls -l total 6 dr-xr-xr-x 5 root wheel - 5 Jul 26 12:11 . dr-xr-xr-x 4 root wheel - 4 Mar 4 08:43 .. drwxr-xr-x 20 root wheel - 21 Jun 29 11:46 20110715 drwxr-xr-x 20 root wheel - 21 Jun 29 11:46 20110722 drwxr-xr-x 20 root wheel - 21 Jun 29 11:46 test In the future, are there any other things I can do to fix the issue short of rebooting ? ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 02:21:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A22D106567A for ; Wed, 27 Jul 2011 02:21:28 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id BC9268FC17 for ; Wed, 27 Jul 2011 02:21:27 +0000 (UTC) Received: by yic13 with SMTP id 13so955952yic.13 for ; Tue, 26 Jul 2011 19:21:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=OWP5iWE+/Ai/6dO5Kbhdt1zcily/Cnl3UxokZIHFcnE=; b=Jlza4NTimdiKa3TWnKY3n5OfJAp8dZtsKFNNGVJFrDDkuTbW3E0oKO0cMyE8AGrMwi RCL0Qn8XWpeiZwxVs29BRWBTy6HJ2xtNxIGkl4awz6nhPWztoaJ/Ps1ZY/Eo4L6/DE7t 4PiQdRw5ctCHY3tQ6F+LyCDvl1vIz8Kz9kUn0= MIME-Version: 1.0 Received: by 10.236.136.226 with SMTP id w62mr8008259yhi.93.1311731797540; Tue, 26 Jul 2011 18:56:37 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.236.108.129 with HTTP; Tue, 26 Jul 2011 18:56:37 -0700 (PDT) In-Reply-To: <20110726142156.GJ17489@deviant.kiev.zoral.com.ua> References: <20110726090441.GD17489@deviant.kiev.zoral.com.ua> <429452924.1012322.1311689248782.JavaMail.root@erie.cs.uoguelph.ca> <20110726142156.GJ17489@deviant.kiev.zoral.com.ua> Date: Wed, 27 Jul 2011 03:56:37 +0200 X-Google-Sender-Auth: 8Uk8oDYksBsC2CPzvXTYBymqk-c Message-ID: From: Attilio Rao To: Kostik Belousov Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD FS Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 02:21:28 -0000 2011/7/26 Kostik Belousov : > On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote: >> Kostik Belousov wrote: >> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote: >> > > Hi, >> > > >> > > Currently both NFS servers set the vnode lock LK_SHARED >> > > and so do the local syscalls (at least that's how it looks >> > > by inspection?). >> > > >> > > Peter Holm just posted me this panic, where a test for an >> > > exclusive vnode lock fails in msdosfs_readdir(). >> > > KDB: stack backtrace: >> > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,.= ..) >> > > at db_trace_self_wrapper+0x26 >> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at >> > > kdb_backtrace+0x2a >> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23 >> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at >> > > assert_vop_elocked+0x55 >> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45 >> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at >> > > msdosfs_readdir+0x528 >> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at >> > > VOP_READDIR_APV+0xc5 >> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at >> > > nfsrvd_readdir+0x38e >> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at >> > > nfsrvd_dorpc+0x1f79 >> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at >> > > nfssvc_program+0x40f >> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) >> > > at svc_run_internal+0x952 >> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at >> > > svc_thread_start+0x10 >> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 >> > > fork_trampoline() at fork_trampoline+0x8 >> > > --- trap 0x804c12e, eip =3D 0xc, esp =3D 0x33, ebp =3D 0x1 --- >> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be >> > > KDB: enter: lock violation >> > > >> > > So, does anyone know if the msdosfs_readdir() really requires a >> > > LK_EXCLUSIVE >> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()? >> > >> > Yes, msdosfs currently requires all vnode locks to be exclusive. One >> > of >> > the reasons is that each denode (the msdosfs-private vnode data) >> > carries >> > the fat entries cache, and this cache is updated even by the >> > operations >> > that do not modify vnode from the VFS POV. >> > >> > The locking regime is enforced by the getnewvnode() initializing the >> > vnode >> > lock with LK_NOSHARE flag, and msdosfs code not calling >> > VN_LOCK_ASHARE() >> > on the newly instantiated vnode. >> > >> > My question is, was the vnode in question locked at all ? >> I think the problem is that I do a LK_DOWNGRADE. From a quick >> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a >> LK_DOWNGRADE. >> >> Maybe __lockmgr_args() should have something like: >> =C2=A0 =C2=A0if (op =3D=3D LK_DOWNGRADE && (lk->lock_object.lo_flags & L= K_NOSHARE)) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 return (0); =C2=A0 /* noop */ >> after the >> =C2=A0 =C2=A0if (op =3D=3D LK_SHARED && (lk->lock_object.lo_flags & LK_N= OSHARE)) >> =C2=A0 =C2=A0 =C2=A0 =C2=A0 op =3D LK_EXCLUSIVE; >> lines? > The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade, > but I agree with the essence of your proposal. As long as the difference in semantic with the old lockmgr is correctly stressed out in the doc (and eventually comments) I'm fine with this change. Attilio --=20 Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 03:10:51 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FB86106566C; Wed, 27 Jul 2011 03:10:51 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 36BAB8FC12; Wed, 27 Jul 2011 03:10:51 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6R3Apxc002723; Wed, 27 Jul 2011 03:10:51 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6R3AoV1002664; Wed, 27 Jul 2011 03:10:50 GMT (envelope-from linimon) Date: Wed, 27 Jul 2011 03:10:50 GMT Message-Id: <201107270310.p6R3AoV1002664@freefall.freebsd.org> To: universite@ukr.net, linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/159210: [zfs] [hang] ZFS (scrub???) freezes system X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 03:10:51 -0000 Old Synopsis: [ZFS] ZFS (scrub???) freezes system New Synopsis: [zfs] [hang] ZFS (scrub???) freezes system State-Changed-From-To: open->closed State-Changed-By: linimon State-Changed-When: Wed Jul 27 03:09:35 UTC 2011 State-Changed-Why: Duplicate of kern/159045. Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 27 03:09:35 UTC 2011 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=159210 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 07:52:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BC03106564A for ; Wed, 27 Jul 2011 07:52:14 +0000 (UTC) (envelope-from gerrit@pmp.uni-hannover.de) Received: from mrelay1.uni-hannover.de (mrelay1.uni-hannover.de [130.75.2.106]) by mx1.freebsd.org (Postfix) with ESMTP id A3E0C8FC08 for ; Wed, 27 Jul 2011 07:52:13 +0000 (UTC) Received: from www.pmp.uni-hannover.de (www.pmp.uni-hannover.de [130.75.117.2]) by mrelay1.uni-hannover.de (8.14.4/8.14.4) with ESMTP id p6R7q3qR026990; Wed, 27 Jul 2011 09:52:07 +0200 Received: from pmp.uni-hannover.de (unknown [130.75.117.3]) by www.pmp.uni-hannover.de (Postfix) with SMTP id 9682E72; Wed, 27 Jul 2011 09:52:03 +0200 (CEST) Date: Wed, 27 Jul 2011 09:52:03 +0200 From: Gerrit =?ISO-8859-1?Q?K=FChn?= To: Mike Tancsa Message-Id: <20110727095203.50f3c0d6.gerrit@pmp.uni-hannover.de> In-Reply-To: <4E2F2122.6080204@sentex.net> References: <4E2F2122.6080204@sentex.net> Organization: Albert-Einstein-Institut (MPI =?ISO-8859-1?Q?f=FCr?= Gravitationsphysik & IGP =?ISO-8859-1?Q?Universit=E4t?= Hannover) X-Mailer: Sylpheed 3.0.3 (GTK+ 2.22.1; amd64-portbld-freebsd8.1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-PMX-Version: 5.5.9.395186, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2011.7.27.73314 Cc: freebsd-fs@freebsd.org Subject: Re: zfs error - snapshot: Bad file descriptor X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gerrit.kuehn@aei.mpg.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 07:52:14 -0000 On Tue, 26 Jul 2011 16:18:42 -0400 Mike Tancsa wrote about zfs error - snapshot: Bad file descriptor: MT> I googled around for an answer to this, but other than reboot, I never MT> found any other strategies. MT> When I would change to /zbackup1/cust1/.zfs MT> MT> and do a MT> ls -l MT> MT> # ls -l MT> ls: snapshot: Bad file descriptor Just for the record: I have exactly the same issue here with zfsv28 and 8-stable from 14th of July. cu Gerrit From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 08:06:31 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E95A1065673; Wed, 27 Jul 2011 08:06:31 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DB9808FC0C; Wed, 27 Jul 2011 08:06:30 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6R86Ukh008412; Wed, 27 Jul 2011 08:06:30 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6R86UiA008408; Wed, 27 Jul 2011 08:06:30 GMT (envelope-from mm) Date: Wed, 27 Jul 2011 08:06:30 GMT Message-Id: <201107270806.p6R86UiA008408@freefall.freebsd.org> To: miks.mikelsons@gmail.com, mm@FreeBSD.org, freebsd-fs@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/142914: [zfs] ZFS performance degradation over time X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 08:06:31 -0000 Synopsis: [zfs] ZFS performance degradation over time State-Changed-From-To: open->closed State-Changed-By: mm State-Changed-When: Wed Jul 27 08:06:30 UTC 2011 State-Changed-Why: Closed on submitter request. http://www.freebsd.org/cgi/query-pr.cgi?pr=142914 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 11:27:11 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4CC71065672 for ; Wed, 27 Jul 2011 11:27:11 +0000 (UTC) (envelope-from prvs=11896da94d=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id C99718FC0C for ; Wed, 27 Jul 2011 11:27:10 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 12:16:25 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 12:16:25 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014339848.msg for ; Wed, 27 Jul 2011 12:16:24 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: From: "Steven Hartland" To: Date: Wed, 27 Jul 2011 12:16:50 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: Subject: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 11:27:11 -0000 Got a machine which is hung accessing a specific zfs pool, other volumes seem unaffected, there's no errors showing on zpool status and no errors in /var/log/messages Processes seem to be hung in a variaty of stats including:- STOP, zfs, zio->i, db->db, tx->tx zfs list also hangs. Here's some procstat -k -k from some hung processes and the output from zfs-stats -a The machine is running 8.2-RELEASE procstat -k -k 94003 PID TID COMM TDNAME KSTACK 94003 100341 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 100417 java initial thread mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_wait+0x72a __umtx_op_wait+0x5e syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 100459 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 100536 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 100544 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _cv_timedwait_sig+0x134 seltdwait+0x98 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 100751 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 100925 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101268 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101319 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101417 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101486 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101498 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101555 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_wait+0x72a __umtx_op_wait_uint_private+0x64 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101563 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101565 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101566 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101804 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101897 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101971 java - mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e txg_hold_open+0x4f dmu_tx_assign+0x189 zfs_freebsd_create+0x2f2 VOP_CREATE_APV+0x31 vn_open_cred+0x4ab kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 101984 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 102164 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _cv_wait_sig+0x128 seltdwait+0x110 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 102627 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103546 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103636 java - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dbuf_findbp+0xf7 dbuf_hold_impl+0xc2 dbuf_hold_level+0x1a dmu_tx_check_ioerr+0x52 dmu_tx_count_write+0x297 dmu_tx_hold_write+0x4a zfs_freebsd_write+0x397 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x8b kern_writev+0x60 write+0x55 syscallenter+0x1e5 94003 103676 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103728 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103748 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103749 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103750 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 94003 103751 java - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dbuf_findbp+0xf7 dbuf_hold_impl+0xc2 dbuf_hold_level+0x1a dmu_tx_check_ioerr+0x52 dmu_tx_count_write+0x297 dmu_tx_hold_write+0x4a zfs_freebsd_write+0x397 VOP_WRITE_APV+0xb2 vn_write+0x2d7 dofilewrite+0x8b kern_writev+0x60 write+0x55 syscallenter+0x1e5 procstat -k -k 39568 PID TID COMM TDNAME KSTACK 39568 100303 find - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dmu_buf_hold+0xcc zap_lockdir+0x55 zap_cursor_retrieve+0x194 zfs_freebsd_readdir+0x2b6 kern_getdirentries+0x217 getdirentries+0x23 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 zfs-stats -a ------------------------------------------------------------------------ ZFS Subsystem Report Wed Jul 27 10:57:12 2011 ------------------------------------------------------------------------ System Information: Kernel Version: 802000 (osreldate) Hardware Platform: amd64 Processor Architecture: amd64 FreeBSD 8.2-RELEASE #0: Fri Mar 18 10:58:44 UTC 2011 root 10:57AM up 112 days, 21 mins, 1 user, load averages: 0.56, 0.39, 0.34 ------------------------------------------------------------------------ System Memory Statistics: Physical Memory: 24555.62M Kernel Memory: 2076.86M DATA: 99.50% 2066.54M TEXT: 0.50% 10.32M ------------------------------------------------------------------------ ZFS pool information: Storage pool Version (spa): 15 Filesystem Version (zpl): 4 ------------------------------------------------------------------------ ARC Misc: Deleted: 122887476 Recycle Misses: 2985802 Mutex Misses: 51968 Evict Skips: 51968 ARC Size: Current Size (arcsize): 12.50% 2847.37M Target Size (Adaptive, c): 12.50% 2847.48M Min Size (Hard Limit, c_min): 12.50% 2847.48M Max Size (High Water, c_max): ~8:1 22779.81M ARC Size Breakdown: Recently Used Cache Size (p): 29.47% 839.09M Freq. Used Cache Size (c-p): 70.53% 2008.38M ARC Hash Breakdown: Elements Max: 1093246 Elements Current: 34.92% 381795 Collisions: 496215463 Chain Max: 11 Chains: 86474 ARC Eviction Statistics: Evicts Total: 2581704887296 Evicts Eligible for L2: 93.73% 2419921712128 Evicts Ineligible for L2: 6.27% 161783175168 Evicts Cached to L2: 0 ARC Efficiency: Cache Access Total: 17042848480 Cache Hit Ratio: 99.85% 17017691729 Cache Miss Ratio: 0.15% 25156751 Actual Hit Ratio: 88.30% 15049590290 Data Demand Efficiency: 99.90% Data Prefetch Efficiency: 86.74% CACHE HITS BY CACHE LIST: Anonymously Used: 11.48% 1952878601 Most Recently Used (mru): 5.30% 901233481 Most Frequently Used (mfu): 83.14% 14148356809 MRU Ghost (mru_ghost): 0.03% 5560514 MFU Ghost (mfu_ghost): 0.06% 9662324 CACHE HITS BY DATA TYPE: Demand Data: 65.68% 11176417108 Prefetch Data: 0.26% 44828929 Demand Metadata: 6.89% 1173279354 Prefetch Metadata: 27.17% 4623166338 CACHE MISSES BY DATA TYPE: Demand Data: 44.65% 11232287 Prefetch Data: 27.24% 6852876 Demand Metadata: 23.17% 5829877 Prefetch Metadata: 4.94% 1241711 ------------------------------------------------------------------------ VDEV Cache Summary: Access Total: 6646743 Hits Ratio: 67.29% 4472518 Miss Ratio: 32.71% 2174225 Delegations: 350939 ------------------------------------------------------------------------ File-Level Prefetch Stats (DMU): DMU Efficiency: Access Total: 119890457768 Hit Ratio: 91.27% 109423001264 Miss Ratio: 8.73% 10467456504 Colinear Access Total: 10467456504 Colinear Hit Ratio: 0.01% 632312 Colinear Miss Ratio: 99.99% 10466824192 Stride Access Total: 107333142359 Stride Hit Ratio: 99.99% 107326197075 Stride Miss Ratio: 0.01% 6945284 DMU misc: Reclaim successes: 2967491512 Reclaim failures: 7499332680 Stream resets: 58396 Stream noresets: 1273657946 Bogus streams: 0 ------------------------------------------------------------------------ ZFS Tunable (sysctl): kern.maxusers=384 vfs.zfs.l2c_only_size=0 vfs.zfs.mfu_ghost_data_lsize=495015424 vfs.zfs.mfu_ghost_metadata_lsize=96888320 vfs.zfs.mfu_ghost_size=591903744 vfs.zfs.mfu_data_lsize=17817088 vfs.zfs.mfu_metadata_lsize=286173184 vfs.zfs.mfu_size=563621376 vfs.zfs.mru_ghost_data_lsize=1540046336 vfs.zfs.mru_ghost_metadata_lsize=849328128 vfs.zfs.mru_ghost_size=2389374464 vfs.zfs.mru_data_lsize=232126464 vfs.zfs.mru_metadata_lsize=118643712 vfs.zfs.mru_size=531284992 vfs.zfs.anon_data_lsize=0 vfs.zfs.anon_metadata_lsize=0 vfs.zfs.anon_size=68533248 vfs.zfs.l2arc_norw=1 vfs.zfs.l2arc_feed_again=1 vfs.zfs.l2arc_noprefetch=0 vfs.zfs.l2arc_feed_min_ms=200 vfs.zfs.l2arc_feed_secs=1 vfs.zfs.l2arc_headroom=2 vfs.zfs.l2arc_write_boost=8388608 vfs.zfs.l2arc_write_max=8388608 vfs.zfs.arc_meta_limit=5971591168 vfs.zfs.arc_meta_used=2665786824 vfs.zfs.mdcomp_disable=0 vfs.zfs.arc_min=2985795584 vfs.zfs.arc_max=23886364672 vfs.zfs.zfetch.array_rd_sz=1048576 vfs.zfs.zfetch.block_cap=256 vfs.zfs.zfetch.min_sec_reap=2 vfs.zfs.zfetch.max_streams=8 vfs.zfs.prefetch_disable=0 vfs.zfs.check_hostid=1 vfs.zfs.recover=0 vfs.zfs.txg.write_limit_override=0 vfs.zfs.txg.synctime=5 vfs.zfs.txg.timeout=30 vfs.zfs.scrub_limit=10 vfs.zfs.vdev.cache.bshift=16 vfs.zfs.vdev.cache.size=10485760 vfs.zfs.vdev.cache.max=16384 vfs.zfs.vdev.aggregation_limit=131072 vfs.zfs.vdev.ramp_rate=2 vfs.zfs.vdev.time_shift=6 vfs.zfs.vdev.min_pending=4 vfs.zfs.vdev.max_pending=10 vfs.zfs.cache_flush_disable=0 vfs.zfs.zil_disable=0 vfs.zfs.zio.use_uma=0 vfs.zfs.version.zpl=4 vfs.zfs.version.spa=15 vfs.zfs.version.dmu_backup_stream=1 vfs.zfs.version.dmu_backup_header=2 vfs.zfs.version.acl=1 vfs.zfs.debug=0 vfs.zfs.super_owner=0 vm.kmem_size=24960106496 vm.kmem_size_scale=1 vm.kmem_size_min=0 vm.kmem_size_max=329853485875 ------------------------------------------------------------------------ procstat -k -k 36541 PID TID COMM TDNAME KSTACK 36541 100971 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 101251 java initial thread mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_wait+0x72a __umtx_op_wait+0x5e syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 101591 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 101648 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 101781 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 101798 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 101993 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102037 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102059 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102191 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_wait+0x72a __umtx_op_wait_uint_private+0x64 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102208 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102210 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102221 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102253 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102279 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102298 java - mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e txg_hold_open+0x4f dmu_tx_assign+0x189 zfs_freebsd_create+0x2f2 VOP_CREATE_APV+0x31 vn_open_cred+0x4ab kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102426 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102594 java - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dmu_buf_hold_array_by_dnode+0x217 dmu_buf_hold_array+0x6a dmu_read_uio+0x3f zfs_freebsd_read+0x5d3 vn_read+0x2cc dofileread+0xa1 kern_readv+0x60 read+0x55 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 102775 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 103293 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 103790 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 103791 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36541 103797 java - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 zio_wait+0x61 dbuf_read+0x39a dmu_buf_hold+0xcc zap_lockdir+0x55 zap_lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x534 zfs_dirlook+0x69 zfs_lookup+0x26b zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf0 VOP_LOOKUP_APV+0x40 lookup+0x452 namei+0x53a kern_statat_vnhook+0x8f 36541 103809 java - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x75a vop_stdlock+0x39 VOP_LOCK1_APV+0x46 _vn_lock+0x47 vget+0x70 cache_lookup+0x50f vfs_cache_lookup+0xc0 VOP_LOOKUP_APV+0x40 lookup+0x452 namei+0x53a vn_open_cred+0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 procstat -k -k 36732 PID TID COMM TDNAME KSTACK 36732 100369 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100790 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100794 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100830 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100853 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100873 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100957 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 100977 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 101124 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_wait+0x72a __umtx_op_wait_uint_private+0x64 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 101130 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 101190 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 101326 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_cv_wait+0x871 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 102146 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 102257 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 102316 java - mi_switch+0x176 sleepq_wait+0x42 _sx_xlock_hard+0x305 _sx_xlock+0x4e txg_hold_open+0x4f dmu_tx_assign+0x189 zfs_freebsd_create+0x2f2 VOP_CREATE_APV+0x31 vn_open_cred+0x4ab kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 102484 java - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 dmu_buf_hold_array_by_dnode+0x28f dmu_buf_hold_array+0x6a dmu_read_uio+0x3f zfs_freebsd_read+0x5d3 vn_read+0x2cc dofileread+0xa1 kern_readv+0x60 read+0x55 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 102943 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _cv_wait_sig+0x128 seltdwait+0x110 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 103102 java initial thread mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_wait_sig+0x16 _sleep+0x269 do_wait+0x72a __umtx_op_wait+0x5e syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 103255 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _sleep+0x1b1 do_cv_wait+0x640 __umtx_op_cv_wait+0x5c syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 103292 java - mi_switch+0x176 sleepq_catch_signals+0x31c sleepq_timedwait_sig+0x19 _cv_timedwait_sig+0x134 seltdwait+0x98 poll+0x2f8 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 36732 103481 java - mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 dmu_buf_hold_array_by_dnode+0x28f dmu_buf_hold_array+0x6a dmu_read_uio+0x3f zfs_freebsd_read+0x5d3 vn_read+0x2cc dofileread+0xa1 kern_readv+0x60 read+0x55 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 12:06:46 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4188E106567C for ; Wed, 27 Jul 2011 12:06:46 +0000 (UTC) (envelope-from prvs=11896da94d=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id C037D8FC19 for ; Wed, 27 Jul 2011 12:06:45 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 13:06:14 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 13:06:14 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014340588.msg for ; Wed, 27 Jul 2011 13:06:13 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> From: "Steven Hartland" To: References: Date: Wed, 27 Jul 2011 13:06:52 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 12:06:46 -0000 I've checked the raw disk and all seems fine there, so does look like its some sort of zfs livelock. I'm trying to keep the machine available in case someone needs more information, but its a production machine so I'm going to have to reboot it in the next few hours. Disk tests:- dd if=/dev/da1 of=/dev/null bs=10m 5724+1 records in 5724+1 records out 60022480896 bytes transferred in 430.479894 secs (139431555 bytes/sec) smartctl -a /dev/da1 smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-RELEASE amd64] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: SandForce Driven SSDs Device Model: Corsair CSSD-F60GB2 Serial Number: 10446509320009990024 Firmware Version: 1.1 User Capacity: 60,022,480,896 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Wed Jul 27 11:27:30 2011 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7f) SMART execute Offline immediate. Auto Offline data collection on/off support. Abort Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 48) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 119 100 050 Pre-fail Always - 0/238293224 5 Retired_Block_Count 0x0033 097 097 003 Pre-fail Always - 256 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 5513h+00m+39.450s 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 2 171 Program_Fail_Count 0x0000 000 000 000 Old_age Offline - 0 172 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline - 0 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 0 177 Wear_Range_Delta 0x0000 000 000 --- Old_age Offline - 1 181 Program_Fail_Count 0x0000 000 000 000 Old_age Offline - 0 182 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 022 026 000 Old_age Always - 22 (Min/Max 0/26) 195 ECC_Uncorr_Error_Count 0x001c 119 100 000 Old_age Offline - 0/238293224 196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0 231 SSD_Life_Left 0x0013 057 057 010 Pre-fail Always - 0 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 152704 234 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 90688 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 90688 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 3584 Error SMART Error Log Read failed: Input/output error Smartctl: SMART Error Log Read Failed Error SMART Error Self-Test Log Read failed: Input/output error Smartctl: SMART Self Test Log Read Failed SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 12:50:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3214A106564A; Wed, 27 Jul 2011 12:50:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E748C8FC16; Wed, 27 Jul 2011 12:50:33 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 6BED346B5B; Wed, 27 Jul 2011 08:50:33 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id AC4CC8A02F; Wed, 27 Jul 2011 08:50:32 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Wed, 27 Jul 2011 08:40:56 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <20110726090441.GD17489@deviant.kiev.zoral.com.ua> <20110726142156.GJ17489@deviant.kiev.zoral.com.ua> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201107270840.57104.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 27 Jul 2011 08:50:32 -0400 (EDT) Cc: Attilio Rao Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 12:50:34 -0000 On Tuesday, July 26, 2011 9:56:37 pm Attilio Rao wrote: > 2011/7/26 Kostik Belousov : > > On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote: > >> Kostik Belousov wrote: > >> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote: > >> > > Hi, > >> > > > >> > > Currently both NFS servers set the vnode lock LK_SHARED > >> > > and so do the local syscalls (at least that's how it looks > >> > > by inspection?). > >> > > > >> > > Peter Holm just posted me this panic, where a test for an > >> > > exclusive vnode lock fails in msdosfs_readdir(). > >> > > KDB: stack backtrace: > >> > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) > >> > > at db_trace_self_wrapper+0x26 > >> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) at > >> > > kdb_backtrace+0x2a > >> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at vfs_badlock+0x23 > >> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at > >> > > assert_vop_elocked+0x55 > >> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at pcbmap+0x45 > >> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at > >> > > msdosfs_readdir+0x528 > >> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at > >> > > VOP_READDIR_APV+0xc5 > >> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at > >> > > nfsrvd_readdir+0x38e > >> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at > >> > > nfsrvd_dorpc+0x1f79 > >> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at > >> > > nfssvc_program+0x40f > >> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) > >> > > at svc_run_internal+0x952 > >> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) at > >> > > svc_thread_start+0x10 > >> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 > >> > > fork_trampoline() at fork_trampoline+0x8 > >> > > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 --- > >> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be > >> > > KDB: enter: lock violation > >> > > > >> > > So, does anyone know if the msdosfs_readdir() really requires a > >> > > LK_EXCLUSIVE > >> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in pcbmap()? > >> > > >> > Yes, msdosfs currently requires all vnode locks to be exclusive. One > >> > of > >> > the reasons is that each denode (the msdosfs-private vnode data) > >> > carries > >> > the fat entries cache, and this cache is updated even by the > >> > operations > >> > that do not modify vnode from the VFS POV. > >> > > >> > The locking regime is enforced by the getnewvnode() initializing the > >> > vnode > >> > lock with LK_NOSHARE flag, and msdosfs code not calling > >> > VN_LOCK_ASHARE() > >> > on the newly instantiated vnode. > >> > > >> > My question is, was the vnode in question locked at all ? > >> I think the problem is that I do a LK_DOWNGRADE. From a quick > >> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a > >> LK_DOWNGRADE. > >> > >> Maybe __lockmgr_args() should have something like: > >> if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags & LK_NOSHARE)) > >> return (0); /* noop */ > >> after the > >> if (op == LK_SHARED && (lk->lock_object.lo_flags & LK_NOSHARE)) > >> op = LK_EXCLUSIVE; > >> lines? > > The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade, > > but I agree with the essence of your proposal. > > As long as the difference in semantic with the old lockmgr is > correctly stressed out in the doc (and eventually comments) I'm fine > with this change. I think it is a bug in the LK_NOSHARE implementation if the old lockmgr() didn't silently nop downgrade requests when LK_NOSHARE was set. :) We should definitely fix it to ignore downgrades for LK_NOSHARE. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 13:45:42 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52B11106566C for ; Wed, 27 Jul 2011 13:45:42 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AEFEF8FC1B for ; Wed, 27 Jul 2011 13:45:41 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA04058; Wed, 27 Jul 2011 16:34:24 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E3013DF.10803@FreeBSD.org> Date: Wed, 27 Jul 2011 16:34:23 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> In-Reply-To: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 13:45:42 -0000 on 27/07/2011 15:06 Steven Hartland said the following: > I've checked the raw disk and all seems fine there, so does look like its > some sort of zfs livelock. > > I'm trying to keep the machine available in case someone needs more information, > but its a production machine so I'm going to have to reboot it in the next > few hours. > > Disk tests:- > > dd if=/dev/da1 of=/dev/null bs=10m 5724+1 records in > 5724+1 records out > 60022480896 bytes transferred in 430.479894 secs (139431555 bytes/sec) > > > smartctl -a /dev/da1 Is this the only disk associated with the troubled pool? > smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-RELEASE amd64] (local build) > Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net > > === START OF INFORMATION SECTION === > Model Family: SandForce Driven SSDs > Device Model: Corsair CSSD-F60GB2 > Serial Number: 10446509320009990024 > Firmware Version: 1.1 > User Capacity: 60,022,480,896 bytes > Device is: In smartctl database [for details use: -P show] > ATA Version is: 8 > ATA Standard is: ATA-8-ACS revision 6 > Local Time is: Wed Jul 27 11:27:30 2011 UTC > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > General SMART Values: > Offline data collection status: (0x00) Offline data collection activity > was never started. > Auto Offline Data Collection: Disabled. > Self-test execution status: ( 0) The previous self-test routine completed > without error or no self-test has ever > been run. > Total time to complete Offline data collection: ( 0) seconds. > Offline data collection > capabilities: (0x7f) SMART execute Offline immediate. > Auto Offline data collection on/off support. > Abort Offline collection upon new > command. > Offline surface scan supported. > Self-test supported. > Conveyance Self-test supported. > Selective Self-test supported. > SMART capabilities: (0x0003) Saves SMART data before entering > power-saving mode. > Supports SMART auto save timer. > Error logging capability: (0x01) Error logging supported. > General Purpose Logging supported. > Short self-test routine recommended polling time: ( 1) minutes. > Extended self-test routine > recommended polling time: ( 48) minutes. > Conveyance self-test routine > recommended polling time: ( 2) minutes. > SCT capabilities: (0x003d) SCT Status supported. > SCT Error Recovery Control supported. > SCT Feature Control supported. > SCT Data Table supported. > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 119 100 050 Pre-fail Always > - 0/238293224 > 5 Retired_Block_Count 0x0033 097 097 003 Pre-fail Always > - 256 > 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always > - 5513h+00m+39.450s > 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always > - 2 > 171 Program_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 172 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline > - 0 > 177 Wear_Range_Delta 0x0000 000 000 --- Old_age Offline > - 1 > 181 Program_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 182 Erase_Fail_Count 0x0000 000 000 000 Old_age Offline > - 0 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always > - 0 > 194 Temperature_Celsius 0x0022 022 026 000 Old_age Always > - 22 (Min/Max 0/26) > 195 ECC_Uncorr_Error_Count 0x001c 119 100 000 Old_age Offline > - 0/238293224 > 196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always > - 0 > 231 SSD_Life_Left 0x0013 057 057 010 Pre-fail Always > - 0 > 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline > - 152704 > 234 SandForce_Internal 0x0000 000 000 000 Old_age Offline > - 90688 > 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always > - 90688 > 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always > - 3584 > > Error SMART Error Log Read failed: Input/output error > Smartctl: SMART Error Log Read Failed > Error SMART Error Self-Test Log Read failed: Input/output error > Smartctl: SMART Self Test Log Read Failed > SMART Selective self-test log data structure revision number 1 > SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS > 1 0 0 Not_testing > 2 0 0 Not_testing > 3 0 0 Not_testing > 4 0 0 Not_testing > 5 0 0 Not_testing > Selective self-test flags (0x0): > After scanning selected spans, do NOT read-scan remainder of disk. > If Selective self-test is pending on power-up, resume after 0 minute delay. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 13:55:30 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2296106566B for ; Wed, 27 Jul 2011 13:55:30 +0000 (UTC) (envelope-from prvs=11896da94d=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5FACC8FC12 for ; Wed, 27 Jul 2011 13:55:29 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 14:54:58 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 14:54:57 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014341803.msg; Wed, 27 Jul 2011 14:54:56 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> Date: Wed, 27 Jul 2011 14:55:36 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 13:55:30 -0000 ----- Original Message ----- From: "Andriy Gapon" >> smartctl -a /dev/da1 > > Is this the only disk associated with the troubled pool? Yes, there's two disks in the machine 1 x 500GB HD (root etc) and 1 x 60GB SSD which is the the pool we're having issues with. As you can see a full raw device dd completed fine. My admins tell me they have had a number of cases like this requiring a power cycle after which all is fine. Apparently it seems to affect machines with high uptimes, if thats of help. This machine shows:- uptime 1:55PM up 112 days, 3:19, 1 user, load averages: 0.00, 0.00, 0.00 Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 14:10:33 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EBD921065670 for ; Wed, 27 Jul 2011 14:10:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2F01C8FC13 for ; Wed, 27 Jul 2011 14:10:32 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA04530; Wed, 27 Jul 2011 17:10:30 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E301C55.7090105@FreeBSD.org> Date: Wed, 27 Jul 2011 17:10:29 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> In-Reply-To: <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 14:10:34 -0000 on 27/07/2011 16:55 Steven Hartland said the following: > Apparently it seems to affect machines > with high uptimes, if thats of help. This machine shows:- > > uptime > 1:55PM up 112 days, 3:19, 1 user, load averages: 0.00, 0.00, 0.00 Just a guess, perhaps it's another manifestation of this issue: http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 14:18:32 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 111261065670; Wed, 27 Jul 2011 14:18:32 +0000 (UTC) (envelope-from prvs=11896da94d=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 5FE5D8FC13; Wed, 27 Jul 2011 14:18:31 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:17:59 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:17:59 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014342069.msg; Wed, 27 Jul 2011 15:17:59 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> Date: Wed, 27 Jul 2011 15:18:38 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 14:18:32 -0000 ----- Original Message ----- From: "Andriy Gapon" To: "Steven Hartland" Cc: Sent: Wednesday, July 27, 2011 3:10 PM Subject: Re: zfs process hang on pool access > on 27/07/2011 16:55 Steven Hartland said the following: >> Apparently it seems to affect machines >> with high uptimes, if thats of help. This machine shows:- >> >> uptime >> 1:55PM up 112 days, 3:19, 1 user, load averages: 0.00, 0.00, 0.00 > > Just a guess, perhaps it's another manifestation of this issue: > http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html Not sure looking at that thread and comparing to the machine:- top -SHb 500 | grep arc 7 root -8 - 0K 88K arc_re 0 11:07 0.00% {arc_reclaim_thre} 7 root -8 - 0K 88K l2arc_ 2 0:52 0.00% {l2arc_feed_threa} So no excessive cpu for reclaim is present and evict_skip is not incrementing: sysctl kstat.zfs.misc.arcstats.evict_skip kstat.zfs.misc.arcstats.evict_skip: 235572240 sleep 60 sysctl kstat.zfs.misc.arcstats.evict_skip kstat.zfs.misc.arcstats.evict_skip: 235572240 Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 14:22:13 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1C82106564A for ; Wed, 27 Jul 2011 14:22:13 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 3E3298FC15 for ; Wed, 27 Jul 2011 14:22:12 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA04726; Wed, 27 Jul 2011 17:22:09 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E301F10.6060708@FreeBSD.org> Date: Wed, 27 Jul 2011 17:22:08 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> In-Reply-To: <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 14:22:14 -0000 on 27/07/2011 17:18 Steven Hartland said the following: > > ----- Original Message ----- From: "Andriy Gapon" > To: "Steven Hartland" > Cc: > Sent: Wednesday, July 27, 2011 3:10 PM > Subject: Re: zfs process hang on pool access > > >> on 27/07/2011 16:55 Steven Hartland said the following: >>> Apparently it seems to affect machines >>> with high uptimes, if thats of help. This machine shows:- >>> >>> uptime >>> 1:55PM up 112 days, 3:19, 1 user, load averages: 0.00, 0.00, 0.00 >> >> Just a guess, perhaps it's another manifestation of this issue: >> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html > > Not sure looking at that thread and comparing to the machine:- I meant the same root cause, not the same symptoms, of course. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 14:32:35 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 201C31065670; Wed, 27 Jul 2011 14:32:35 +0000 (UTC) (envelope-from prvs=11896da94d=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 6E32E8FC0C; Wed, 27 Jul 2011 14:32:34 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:32:02 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 15:32:02 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014342226.msg; Wed, 27 Jul 2011 15:32:02 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> From: "Steven Hartland" To: "Andriy Gapon" References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> Date: Wed, 27 Jul 2011 15:32:41 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 14:32:35 -0000 ----- Original Message ----- From: "Andriy Gapon" >>> Just a guess, perhaps it's another manifestation of this issue: >>> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html >> >> Not sure looking at that thread and comparing to the machine:- > > I meant the same root cause, not the same symptoms, of course. Ahh, is there anyway to confirm that before I reboot, or any other information we could glean that might be useful? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 14:34:49 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67770106564A for ; Wed, 27 Jul 2011 14:34:49 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A60438FC0A for ; Wed, 27 Jul 2011 14:34:48 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05040; Wed, 27 Jul 2011 17:34:44 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E302204.2030009@FreeBSD.org> Date: Wed, 27 Jul 2011 17:34:44 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Steven Hartland References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> In-Reply-To: <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 14:34:49 -0000 on 27/07/2011 17:32 Steven Hartland said the following: > ----- Original Message ----- From: "Andriy Gapon" >>>> Just a guess, perhaps it's another manifestation of this issue: >>>> http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html >>> >>> Not sure looking at that thread and comparing to the machine:- >> >> I meant the same root cause, not the same symptoms, of course. > > Ahh, is there anyway to confirm that before I reboot, or any other > information we could glean that might be useful? No quick ideas, unfortunately. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 17:56:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3CD781065673; Wed, 27 Jul 2011 17:56:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 70FB08FC12; Wed, 27 Jul 2011 17:56:09 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: As4AAPpPME6DaFvO/2dsb2JhbAA1AQEFKQRGEh0OCgICDQceAhYSPwcXhFaTLJA/uWyRSIErgXuCC4EPBJJ1iDOBOIcT X-IronPort-AV: E=Sophos;i="4.67,277,1309752000"; d="scan'208";a="132455253" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 27 Jul 2011 13:56:08 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 57F7BB40F7; Wed, 27 Jul 2011 13:56:08 -0400 (EDT) Date: Wed, 27 Jul 2011 13:56:08 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <1847245041.1083168.1311789368340.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201107270840.57104.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: Attilio Rao , freebsd-fs@freebsd.org Subject: Re: Does msodsfs_readdir() require a exclusively locked vnode X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 17:56:10 -0000 John Baldwin wrote: > On Tuesday, July 26, 2011 9:56:37 pm Attilio Rao wrote: > > 2011/7/26 Kostik Belousov : > > > On Tue, Jul 26, 2011 at 10:07:28AM -0400, Rick Macklem wrote: > > >> Kostik Belousov wrote: > > >> > On Mon, Jul 25, 2011 at 07:22:40PM -0400, Rick Macklem wrote: > > >> > > Hi, > > >> > > > > >> > > Currently both NFS servers set the vnode lock LK_SHARED > > >> > > and so do the local syscalls (at least that's how it looks > > >> > > by inspection?). > > >> > > > > >> > > Peter Holm just posted me this panic, where a test for an > > >> > > exclusive vnode lock fails in msdosfs_readdir(). > > >> > > KDB: stack backtrace: > > >> > > > db_trace_self_wrapper(c0efa6f6,c71627f8,c79230b0,c0f2ef29,f19154b8,...) > > >> > > at db_trace_self_wrapper+0x26 > > >> > > kdb_backtrace(c7f20b38,f19154fc,c0d586d5,f191550c,c7f20ae0,...) > > >> > > at > > >> > > kdb_backtrace+0x2a > > >> > > vfs_badlock(c101b180,f191550c,c1055580,c7f20ae0) at > > >> > > vfs_badlock+0x23 > > >> > > assert_vop_elocked(c7f20ae0,c0ee5f4f,c09f3213,8,0,...) at > > >> > > assert_vop_elocked+0x55 > > >> > > pcbmap(c7966e00,0,f191560c,f1915618,f191561c,...) at > > >> > > pcbmap+0x45 > > >> > > msdosfs_readdir(f1915960,c0f4b343,c7f20ae0,f1915940,0,...) at > > >> > > msdosfs_readdir+0x528 > > >> > > VOP_READDIR_APV(c101b180,f1915960,2,f1915a68,c7923000,...) at > > >> > > VOP_READDIR_APV+0xc5 > > >> > > nfsrvd_readdir(f1915b64,0,c7f20ae0,c7923000,f1915a68,...) at > > >> > > nfsrvd_readdir+0x38e > > >> > > nfsrvd_dorpc(f1915b64,0,c7923000,c842a200,4,...) at > > >> > > nfsrvd_dorpc+0x1f79 > > >> > > nfssvc_program(c7793800,c842a200,c0f24d67,492,0,...) at > > >> > > nfssvc_program+0x40f > > >> > > svc_run_internal(f1915d14,c09d9a98,c73dfa80,f1915d28,c0ef1130,...) > > >> > > at svc_run_internal+0x952 > > >> > > svc_thread_start(c73dfa80,f1915d28,c0ef1130,3a5,c7e4b2c0,...) > > >> > > at > > >> > > svc_thread_start+0x10 > > >> > > fork_exit(c0bed7d0,c73dfa80,f1915d28) at fork_exit+0xb8 > > >> > > fork_trampoline() at fork_trampoline+0x8 > > >> > > --- trap 0x804c12e, eip = 0xc, esp = 0x33, ebp = 0x1 --- > > >> > > pcbmap: 0xc7f20ae0 is not exclusive locked but should be > > >> > > KDB: enter: lock violation > > >> > > > > >> > > So, does anyone know if the msdosfs_readdir() really requires > > >> > > a > > >> > > LK_EXCLUSIVE > > >> > > locked vnode or is the ASSERT_VOP_ELOCKED() too strong in > > >> > > pcbmap()? > > >> > > > >> > Yes, msdosfs currently requires all vnode locks to be > > >> > exclusive. One > > >> > of > > >> > the reasons is that each denode (the msdosfs-private vnode > > >> > data) > > >> > carries > > >> > the fat entries cache, and this cache is updated even by the > > >> > operations > > >> > that do not modify vnode from the VFS POV. > > >> > > > >> > The locking regime is enforced by the getnewvnode() > > >> > initializing the > > >> > vnode > > >> > lock with LK_NOSHARE flag, and msdosfs code not calling > > >> > VN_LOCK_ASHARE() > > >> > on the newly instantiated vnode. > > >> > > > >> > My question is, was the vnode in question locked at all ? > > >> I think the problem is that I do a LK_DOWNGRADE. From a quick > > >> look at __lockmgr_args(), it doesn't check LK_NOSHARE for a > > >> LK_DOWNGRADE. > > >> > > >> Maybe __lockmgr_args() should have something like: > > >> if (op == LK_DOWNGRADE && (lk->lock_object.lo_flags & > > >> LK_NOSHARE)) > > >> return (0); /* noop */ > > >> after the > > >> if (op == LK_SHARED && (lk->lock_object.lo_flags & > > >> LK_NOSHARE)) > > >> op = LK_EXCLUSIVE; > > >> lines? > > > The RELENG_7 lockmgr does not check the NOSHARE flag on downgrade, > > > but I agree with the essence of your proposal. > > > > As long as the difference in semantic with the old lockmgr is > > correctly stressed out in the doc (and eventually comments) I'm fine > > with this change. > > I think it is a bug in the LK_NOSHARE implementation if the old > lockmgr() > didn't silently nop downgrade requests when LK_NOSHARE was set. :) We > should definitely fix it to ignore downgrades for LK_NOSHARE. > By the way, I think that __lockmgr_args() in -current doesn't check for LK_NOSHARE. That was what pho@ was testing when he found the problem. At this point, I believe that the new NFS server (which I have a patch for that pho@ is testing to avoid LK_DOWNGRADE) is the only place that is broken. (compute_cn_lkflags() only sets LK_SHARED if MNT_LOOKUP_SHARED is set and the only LK_DOWNGRADE I see is in vfs_cache.c when cn_lkflags == LK_SHARED. The rest are in file systems that handle LK_SHARED locked vnodes, from what I can see at a glance.) So, it isn't a difference between old/current behaviour, just a suggestion that adding a check in __lockmgr_args() might be a nice safety belt for the future, since __lockargs_mgr() already checks for the LK_SHARED case. rick, who will get the fix for the new NFS server to re@ soon From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 20:41:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A06C91065672 for ; Wed, 27 Jul 2011 20:41:43 +0000 (UTC) (envelope-from dpd@bitgravity.com) Received: from mail1.sjc1.bitgravity.com (mail1.sjc1.bitgravity.com [209.131.97.19]) by mx1.freebsd.org (Postfix) with ESMTP id 81A228FC1D for ; Wed, 27 Jul 2011 20:41:43 +0000 (UTC) Received: from mail-pz0-f52.google.com ([209.85.210.52]) by mail1.sjc1.bitgravity.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1QmAvH-000JWN-57; Wed, 27 Jul 2011 13:41:43 -0700 Received: by pzd13 with SMTP id 13so2863640pzd.25 for ; Wed, 27 Jul 2011 13:41:37 -0700 (PDT) Received: by 10.68.40.131 with SMTP id x3mr424521pbk.128.1311799296984; Wed, 27 Jul 2011 13:41:36 -0700 (PDT) Received: from netops-153.sfo1.bitgravity.com (netops-153.sfo1.bitgravity.com [209.131.110.153]) by mx.google.com with ESMTPS id m7sm217166pbk.70.2011.07.27.13.41.35 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 27 Jul 2011 13:41:36 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: David P Discher In-Reply-To: <4E302204.2030009@FreeBSD.org> Date: Wed, 27 Jul 2011 13:41:34 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com> References: <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> <4E302204.2030009@FreeBSD.org> To: Steven Hartland X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@FreeBSD.org, Andriy Gapon Subject: Re: zfs process hang on pool access X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 20:41:43 -0000 The way I found this was breaking into the debugger, do some back = traces, continue, break in again, do some more back traces on the hung = processes ... see what is going on, then walk through the code.=20 Then what I had specific loops and code locations, asking the higher = powers of the freebsd kernel world. Of course, I had the high cpu and was peaking at the arc_reclaim_thread.=20= I've seen this nearly like clockwork in production at 106-107 days. If = it goes on too much longer than that, then things deadlock.=20 But 112 days, and 8.2 ... you for sure have the LBOLT overflow.=20 Otherwise, reboot and patch. However, I have not fully vetted the patch = under heavily load, and currently seeing another deadlock issue with = 8.1+ zfs v14 - but seemly durning writes after 6-40 hours. Still = investigating.=20 Note, my proposal of "time_uptime" doesn't work - as it causes a = buildworld error in zfs userland tools. This is what I'm currently running to fix the 26 day issue with l2arc = feeder and arc_reclaim_thread with LBOLT in 8.1.=20 Index: sys/cddl/compat/opensolaris/sys/time.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/compat/opensolaris/sys/time.h (.../8.1-BGOS-20110105) = (revision 3322) +++ sys/cddl/compat/opensolaris/sys/time.h (.../8.1-BGOS-20110613) = (working copy) @@ -38,7 +38,7 @@ =20 typedef longlong_t hrtime_t; =20 -#define LBOLT ((gethrtime() * hz) / NANOSEC) +#define LBOLT (gethrtime() * (NANOSEC/hz)) =20 #if defined(__i386__) || defined(__powerpc__) #define TIMESPEC_OVERFLOW(ts) = \ Index: sys/cddl/compat/opensolaris/sys/types.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/compat/opensolaris/sys/types.h (.../8.1-BGOS-20110105) = (revision 3322) +++ sys/cddl/compat/opensolaris/sys/types.h (.../8.1-BGOS-20110613) = (working copy) @@ -34,6 +34,12 @@ */ =20 #include + +#ifdef _KERNEL +typedef int64_t clock_t; +#define _CLOCK_T_DECLARED +#endif + #include_next =20 #define MAXNAMELEN 256 --- David P. Discher dpd@bitgravity.com * AIM: bgDavidDPD BITGRAVITY * http://www.bitgravity.com On Jul 27, 2011, at 7:34 AM, Andriy Gapon wrote: >> Ahh, is there anyway to confirm that before I reboot, or any other >> information we could glean that might be useful? >=20 > No quick ideas, unfortunately. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 22:39:44 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A6D33106566C for ; Wed, 27 Jul 2011 22:39:44 +0000 (UTC) (envelope-from prvs=11896da94d=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 322758FC0A for ; Wed, 27 Jul 2011 22:39:43 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 23:39:11 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 27 Jul 2011 23:39:11 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014349253.msg for ; Wed, 27 Jul 2011 23:39:10 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=11896da94d=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> From: "Steven Hartland" To: Date: Wed, 27 Jul 2011 23:39:46 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: Subject: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 22:39:44 -0000 There seems to be loads of info about this but nothing concrete so I'm hoping someone here can answer some questions:- 1. Does newfs -E work on all controllers or only in combination with ahci ada driver? In our case the drivers are off an LSI controller using the mpt driver mpt0: port 0xfc00-0xfcff mem 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci2 mpt0: [ITHREAD] mpt0: MPI Version=1.5.18.0 mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) mpt0: 0 Active Volumes (2 Max) mpt0: 0 Hidden Drive Members (14 Max) 2. If newfs -E doesn't work, which I suspect is the case, is using something like partedmagic boot cd and the secure erase app in that still an option or is that again thwarted by the LSI controller? 3. If neither #1 or #2 work is there an alternative which will without taking the drive out of the machine putting it in something which supports ada and running one of the above on that machine? My current testing seems to indicate neither #1 or #2 work in this case as write performance on Corsair SSD is still terrible after both. If #1 does require ata then it would be good to note this in the man page for newfs which currently indicates it will just work. da1 at mpt0 bus 0 scbus0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 57241MB (117231408 512 byte sectors: 255H 63S/T 7297C) By terrible I mean under 20MB/s sequential write speed where as a new drive in a similar machine is showing closer to 200MB/s write oldssd# dd if=/data/test of=/ssd/test bs=1m 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 60.430616 secs (17351734 bytes/sec) newssd# dd if=/data/test of=/ssd/test bs=1m 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 0.555287 secs (1888349211 bytes/sec) In both tests /data/test was just created from /dev/random onto a standard HD but is still in ARC so read speed is very high, hence not the limiting factor. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 23:50:02 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A15921065670; Wed, 27 Jul 2011 23:50:02 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 70F238FC13; Wed, 27 Jul 2011 23:50:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6RNo217080334; Wed, 27 Jul 2011 23:50:02 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6RNo2rE080330; Wed, 27 Jul 2011 23:50:02 GMT (envelope-from linimon) Date: Wed, 27 Jul 2011 23:50:02 GMT Message-Id: <201107272350.p6RNo2rE080330@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/159232: [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into ext2_vnops X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 23:50:02 -0000 Old Synopsis: fs/ext2fs: merge ext2_readwrite into ext2_vnops New Synopsis: [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into ext2_vnops Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 27 23:49:41 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=159232 From owner-freebsd-fs@FreeBSD.ORG Wed Jul 27 23:50:35 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A294106568A; Wed, 27 Jul 2011 23:50:35 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D66978FC13; Wed, 27 Jul 2011 23:50:34 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6RNoY9p083051; Wed, 27 Jul 2011 23:50:34 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6RNoY2C083039; Wed, 27 Jul 2011 23:50:34 GMT (envelope-from linimon) Date: Wed, 27 Jul 2011 23:50:34 GMT Message-Id: <201107272350.p6RNoY2C083039@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/159233: [ext2fs] [patch] fs/ext2fs: finish reallocblk implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2011 23:50:35 -0000 Old Synopsis: fs/ext2fs: finish reallocblk implementation New Synopsis: [ext2fs] [patch] fs/ext2fs: finish reallocblk implementation Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed Jul 27 23:50:21 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=159233 From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 01:24:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CCE01065673 for ; Thu, 28 Jul 2011 01:24:59 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.westchester.pa.mail.comcast.net (qmta03.westchester.pa.mail.comcast.net [76.96.62.32]) by mx1.freebsd.org (Postfix) with ESMTP id 4D3ED8FC08 for ; Thu, 28 Jul 2011 01:24:59 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta03.westchester.pa.mail.comcast.net with comcast id DDPt1h0011vXlb853DQzih; Thu, 28 Jul 2011 01:24:59 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.westchester.pa.mail.comcast.net with comcast id DDQf1h00C1t3BNj3dDQg9Y; Thu, 28 Jul 2011 01:24:41 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 91798102C36; Wed, 27 Jul 2011 18:24:37 -0700 (PDT) Date: Wed, 27 Jul 2011 18:24:37 -0700 From: Jeremy Chadwick To: Steven Hartland Message-ID: <20110728012437.GA23430@icarus.home.lan> References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 01:24:59 -0000 On Wed, Jul 27, 2011 at 11:39:46PM +0100, Steven Hartland wrote: > There seems to be loads of info about this but nothing concrete so > I'm hoping someone here can answer some questions:- > > 1. Does newfs -E work on all controllers or only in combination > with ahci ada driver? In our case the drivers are off an LSI controller > using the mpt driver > > mpt0: port 0xfc00-0xfcff mem 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci2 > mpt0: [ITHREAD] > mpt0: MPI Version=1.5.18.0 > mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) > mpt0: 0 Active Volumes (2 Max) > mpt0: 0 Hidden Drive Members (14 Max) > > 2. If newfs -E doesn't work, which I suspect is the case, is using > something like partedmagic boot cd and the secure erase app in that > still an option or is that again thwarted by the LSI controller? newfs -E is not the same thing as "Secure Erase" (issuing SECURE ERASE UNIT ATA command per ATA security data set spec). newfs -E does exactly what the man page says it does: it writes zeros over every LBA on the disk (but it does so in blocks, not on a literal per-LBA basis; e.g. it does not write 512 bytes (LBA size) of zeros to LBA 0, then 512 bytes of zeros to LBA 1, etc. -- it does so in larger chunks). The important thing to take away from this is that the FTL will not be reset to its factory-default configuration when erasing in this fashion. > 3. If neither #1 or #2 work is there an alternative which will > without taking the drive out of the machine putting it in something > which supports ada and running one of the above on that machine? > > My current testing seems to indicate neither #1 or #2 work in this > case as write performance on Corsair SSD is still terrible after > both. If #1 does require ata then it would be good to note this in > the man page for newfs which currently indicates it will just work. > > da1 at mpt0 bus 0 scbus0 target 1 lun 0 > da1: Fixed Direct Access SCSI-5 device > da1: 300.000MB/s transfers > da1: Command Queueing enabled > da1: 57241MB (117231408 512 byte sectors: 255H 63S/T 7297C) > > By terrible I mean under 20MB/s sequential write speed where as a > new drive in a similar machine is showing closer to 200MB/s write > > oldssd# dd if=/data/test of=/ssd/test bs=1m > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 60.430616 secs (17351734 bytes/sec) There are many factors to consider with SSDs when write speeds plummet. The biggest and most noticeable is how much free space is available on the drive itself. The less free space available, the worse wear levelling performs. I just got done dealing with a person on the Intel Community Forums who complained of shoddy write performance, where lots of "techs" completely ignored the fact that his drive was showing 90% full (only 7GB left). Is the /ssd partition actually aligned properly? I want to assume it's UFS, not ZFS, given your earlier questions, but is the partition aligned to a 8KByte boundary? (Most consumers tend to start their partitions at the 1MByte mark, but this is a bit overkill; I don't know what Corsair uses for NAND cell size nor erase page size, but with Intel the drives use 8KByte cells). Also, PRIOR to performing these tests, did you tunefs -t enable the filesystem? It matters; TRIM is a much nicer way to ensure the drive restores itself to performance when LBAs on the drive become unused by the filesystem (rather than letting the internal drive GC "figure it out" as best as it can, it's always best to just tell the drive up front with TRIM what's no longer used. Saves the FTL extra work) > newssd# dd if=/data/test of=/ssd/test bs=1m 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 0.555287 secs (1888349211 bytes/sec) > > In both tests /data/test was just created from /dev/random onto > a standard HD but is still in ARC so read speed is very high, hence > not the limiting factor. Is there some reason your tests couldn't just use /dev/urandom directly to absolutely positively rule out read I/O (from if=) being a potential limiting factor? I absolutely believe you, but just sayin'... Worth reading is this whitepaper, by the way. http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf By the way, your above dd is the first time I've seen an SSD write 1.8GBytes in 0.5 seconds. Though I cannot rely entirely on benchmark reviews, the one I just skimmed indicated a fresh drive of your model tends to write, sequentially, at about 60MBytes/sec.. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 09:09:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC6B5106566C for ; Thu, 28 Jul 2011 09:09:50 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3BD0E8FC0C for ; Thu, 28 Jul 2011 09:09:50 +0000 (UTC) Received: by fxe4 with SMTP id 4so1402843fxe.13 for ; Thu, 28 Jul 2011 02:09:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=pQT8aa34JxgFKqhcpbbBYTsV/8/tXzw/cuPEX8ACQ2Y=; b=bM0pUAg6F23Le5oQu2tisbQYvcjgm61me//vv5uJ3BQ7QziDNpp8fiCdriXKo+Kx3S 839oTw63Lk2G5VYrpR7Y4qJz1/mDPQHZiW+uxXJp6TdZyTbC+GBSF19QJwcaz+BY3GOT NH4xUf0YrYPrYAEuG6VB+Yg+hp6oWAslX38Y8= Received: by 10.204.32.201 with SMTP id e9mr235937bkd.392.1311844189109; Thu, 28 Jul 2011 02:09:49 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id sz1sm205187bkb.58.2011.07.28.02.09.46 (version=SSLv3 cipher=OTHER); Thu, 28 Jul 2011 02:09:47 -0700 (PDT) Sender: Alexander Motin Message-ID: <4E312747.3020009@FreeBSD.org> Date: Thu, 28 Jul 2011 12:09:27 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Steven Hartland References: In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 09:09:50 -0000 Steven Hartland wrote: > There seems to be loads of info about this but nothing concrete so > I'm hoping someone here can answer some questions:- > > 1. Does newfs -E work on all controllers or only in combination > with ahci ada driver? In our case the drivers are off an LSI controller > using the mpt driver > > mpt0: port 0xfc00-0xfcff mem > 0xdf2ec000-0xdf2effff,0xdf2f0000-0xdf2fffff irq 16 at device 0.0 on pci2 > mpt0: [ITHREAD] > mpt0: MPI Version=1.5.18.0 > mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) > mpt0: 0 Active Volumes (2 Max) > mpt0: 0 Hidden Drive Members (14 Max) `newfs -E` depends on disk driver's support for BIO_DELETE request. For now, AFAIR it is supported at least by ada, mmcsd, some cases of ad and few other cases. da driver doesn't support it now. Also, except da driver, TRIM command should be supported by the controller firmware, that implements SCSI<->ATA protocol translation, and AFAIK often it isn't so. > 2. If newfs -E doesn't work, which I suspect is the case, is using > something like partedmagic boot cd and the secure erase app in that > still an option or is that again thwarted by the LSI controller? Secure erase for the whole disk can be done using special ATA commands, unrelated to TRIM, but with the same end result. I have no idea if those commands have SCSI alternatives, but if so and they are used by mentioned software, there is a chance that controller firmware support them. -- Alexander Motin From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 09:20:15 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E307106564A for ; Thu, 28 Jul 2011 09:20:15 +0000 (UTC) (envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id BED7B8FC17 for ; Thu, 28 Jul 2011 09:20:14 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:09:26 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:09:25 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014354658.msg for ; Thu, 28 Jul 2011 10:09:24 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: From: "Steven Hartland" To: "Jeremy Chadwick" References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> <20110728012437.GA23430@icarus.home.lan> Date: Thu, 28 Jul 2011 10:10:03 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 09:20:15 -0000 ----- Original Message ----- From: "Jeremy Chadwick" > newfs -E is not the same thing as "Secure Erase" (issuing SECURE ERASE > UNIT ATA command per ATA security data set spec). newfs -E does exactly > what the man page says it does: it writes zeros over every LBA on the > disk (but it does so in blocks, not on a literal per-LBA basis; e.g. it > does not write 512 bytes (LBA size) of zeros to LBA 0, then 512 bytes of > zeros to LBA 1, etc. -- it does so in larger chunks). > > The important thing to take away from this is that the FTL will not be > reset to its factory-default configuration when erasing in this fashion. It was my impression this was combined with a BIO_DELETE, which is the key part I thought, but that seems to only be supported under ada? > There are many factors to consider with SSDs when write speeds plummet. > > The biggest and most noticeable is how much free space is available on > the drive itself. The less free space available, the worse wear > levelling performs. I just got done dealing with a person on the Intel > Community Forums who complained of shoddy write performance, where lots > of "techs" completely ignored the fact that his drive was showing 90% > full (only 7GB left). Not the case here, the drive has over 60% free space, but a large move of data from one volume to many smaller volumes had just taken place. Still suprisingly large drop, over 10x slower that it was. > Is the /ssd partition actually aligned properly? I want to assume it's > UFS, not ZFS, given your earlier questions, but is the partition aligned > to a 8KByte boundary? (Most consumers tend to start their partitions at > the 1MByte mark, but this is a bit overkill; I don't know what Corsair > uses for NAND cell size nor erase page size, but with Intel the drives > use 8KByte cells). No its ZFS, and not exhibiting performance problems in the initial 6 months or so alignment is not the issue in this case, only after the data move yesterday did the lower performance get noticed. > > Also, PRIOR to performing these tests, did you tunefs -t enable the > filesystem? It matters; TRIM is a much nicer way to ensure the drive > restores itself to performance when LBAs on the drive become unused by > the filesystem (rather than letting the internal drive GC "figure it > out" as best as it can, it's always best to just tell the drive up front > with TRIM what's no longer used. Saves the FTL extra work) ZFS so no TRIM support :( > Is there some reason your tests couldn't just use /dev/urandom directly > to absolutely positively rule out read I/O (from if=) being a potential > limiting factor? I absolutely believe you, but just sayin'... Didn't realise there was a /dev/urandom, but /dev/random was very much limited, which reading the man page makes sense now, something to remember for next time :) > Worth reading is this whitepaper, by the way. > > http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf > > By the way, your above dd is the first time I've seen an SSD write > 1.8GBytes in 0.5 seconds. Though I cannot rely entirely on benchmark > reviews, the one I just skimmed indicated a fresh drive of your model > tends to write, sequentially, at about 60MBytes/sec.. Hmm, I must have copied the wrong results there some where, here's the correct one which shows 180MB/s, which is still lower than the spec's 285MB/s but its random data so not benefiting as much as it can from the compression on the sandforce controller, most defintielty not 1.8GB/s ;-) dd if=/data/test of=/ssd/test bs=1m 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 5.542815 secs (189177506 bytes/sec) As an update I've manged to get the drive back to full performance using Parted Magic boot cd, but using the manual process shown on the following page "instead" of using Disk Erase utility. Not sure why this didnt work yet. https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase Obviously having to boot to an alternative OS is far from ideal, so could really do with a BSD solution that has the ability to secure erase the disk, to restore performance, given the lack of TRIM in ZFS. Is this something that could be added to camcontrol or may be its already possible with "camcontrol cmd"? Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 09:26:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44A67106566B; Thu, 28 Jul 2011 09:26:14 +0000 (UTC) (envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 957448FC16; Thu, 28 Jul 2011 09:26:13 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:25:42 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 10:25:41 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014354790.msg; Thu, 28 Jul 2011 10:25:40 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk Message-ID: <1E487B0F985745459272F052426964C7@multiplay.co.uk> From: "Steven Hartland" To: "Alexander Motin" References: <4E312747.3020009@FreeBSD.org> Date: Thu, 28 Jul 2011 10:26:19 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@freebsd.org Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 09:26:14 -0000 ----- Original Message ----- From: "Alexander Motin" > `newfs -E` depends on disk driver's support for BIO_DELETE request. For > now, AFAIR it is supported at least by ada, mmcsd, some cases of ad and > few other cases. da driver doesn't support it now. Also, except da > driver, TRIM command should be supported by the controller firmware, > that implements SCSI<->ATA protocol translation, and AFAIK often it > isn't so. That's what I thought might be the case, would be good to mention this in the man page as atm its very misleading. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 10:32:39 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE26A106566B for ; Thu, 28 Jul 2011 10:32:39 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.westchester.pa.mail.comcast.net (qmta06.westchester.pa.mail.comcast.net [76.96.62.56]) by mx1.freebsd.org (Postfix) with ESMTP id 6A39A8FC1F for ; Thu, 28 Jul 2011 10:32:38 +0000 (UTC) Received: from omta04.westchester.pa.mail.comcast.net ([76.96.62.35]) by qmta06.westchester.pa.mail.comcast.net with comcast id DNWl1h0020ldTLk56NYeyx; Thu, 28 Jul 2011 10:32:38 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta04.westchester.pa.mail.comcast.net with comcast id DNYc1h00T1t3BNj3QNYdfL; Thu, 28 Jul 2011 10:32:38 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 02132102C36; Thu, 28 Jul 2011 03:32:35 -0700 (PDT) Date: Thu, 28 Jul 2011 03:32:34 -0700 From: Jeremy Chadwick To: Steven Hartland Message-ID: <20110728103234.GA33275@icarus.home.lan> References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> <20110728012437.GA23430@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 10:32:39 -0000 On Thu, Jul 28, 2011 at 10:10:03AM +0100, Steven Hartland wrote: > ----- Original Message ----- From: "Jeremy Chadwick" > > >> [snipping parts about BIO_DELETE and details pertaining to ZFS, >> hoping TRIM support gets added eventually, or possibly through GEOM >> directly someday...] > > Didn't realise there was a /dev/urandom, but /dev/random was very much > limited, which reading the man page makes sense now, something to remember > for next time :) Well, on FreeBSD /dev/urandom is a symlink to /dev/random. I've discussed in the past why I use /dev/urandom instead of /dev/random (I happen to work in a heterogeneous OS environment at work, where urandom and random are different things). I was mainly curious why you were using if=/some/actual/file rather than if=/dev/urandom directly. 'tis okay, not of much importance. > >Worth reading is this whitepaper, by the way. > > > >http://www.stec-inc.com/downloads/whitepapers/Benchmarking_Enterprise_SSDs.pdf > > > >By the way, your above dd is the first time I've seen an SSD write > >1.8GBytes in 0.5 seconds. Though I cannot rely entirely on benchmark > >reviews, the one I just skimmed indicated a fresh drive of your model > >tends to write, sequentially, at about 60MBytes/sec.. > > Hmm, I must have copied the wrong results there some where, here's > the correct one which shows 180MB/s, which is still lower than the spec's > 285MB/s but its random data so not benefiting as much as it can from the > compression on the sandforce controller, most defintielty not 1.8GB/s ;-) > > dd if=/data/test of=/ssd/test bs=1m 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 5.542815 secs (189177506 bytes/sec) > > As an update I've manged to get the drive back to full performance using > Parted Magic boot cd, but using the manual process shown on the following > page "instead" of using Disk Erase utility. Not sure why this didnt work > yet. > https://ata.wiki.kernel.org/index.php/ATA_Secure_Erase Okay, so it sounds like what happened -- if I understand correctly -- is that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of data copied to it. It still had 60% free space available. After, the SSD performance for writes really plummeted (~20MByte/sec), but reads were still decent. Performing an actual ATA-level secure erase brought the drive back to normal write performance (~190MByte/sec). If all of that is correct, then I would say the issue is that the internal GC on the Corsair SSD in question sucks. With 60% of the drive still available, performance should not have dropped to such an abysmal rate; the FTL and wear levelling should have, ideally, dealt with this just fine. But it didn't. Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever, that's an engineering discussion for elsewhere) lacks TRIM. The underlying filesystem is therefore unable to tell the drive "hey, these LBAs aren't used any more, you can consider them free and perform a NAND page erase when an entire NAND page is unused". The FTL has to track all LBAs you've written to, otherwise if erasing a NAND page which still had used data in it (for the filesystem) it would result in loss of data. So in summary I'm not too surprised by this situation happening, but I *AM* surprised at just how horrible writes became for you. The white paper I linked you goes over this to some degree -- it talks about how everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks or talks about how horrible they perform when very little free space is available, or if the GC is badly implemented. Maybe Corsair's GC is badly implemented -- I don't know. I would see if there are any F/W updates for that model of drive. The firmware controls the GC model/method. Otherwise, if this issue is reproducible, I'll add this model of Corsair SSD to my list of drives to avoid. > Obviously having to boot to an alternative OS is far from ideal, so could > really do with a BSD solution that has the ability to secure erase the disk, > to restore performance, given the lack of TRIM in ZFS. > > Is this something that could be added to camcontrol or may be its already > possible with "camcontrol cmd"? Is it possible to accomplish Secure Erase via "camcontrol cmd" with ada(4)? Yes, but the procedure will be extremely painful, drawn out, and very error-prone. Given that you've followed the procedure on the Linux hdparm/ATA Secure Erase web page, you're aware of the security and "locked" status one has to deal with using password-protection to accomplish the erase. hdparm makes this easy because it's just a bunch of command-line flags; the ""heavy lifting"" on the ATA layer is done elsewhere. With "camcontrol cmd", you get to submit the raw ATA CDB yourself, multiple times, at different phases. Just how familiar with the ATA protocol are you? :-) Why I sound paranoid: a typo could potentially "brick" your drive. If you issue a set-password on the drive, ***ALL*** LBA accesses (read and write) return I/O errors from that point forward. Make a typo in the password, formulate the CDB wrong, whatever -- suddenly you have a drive that you can't access or use any more because the password was wrong, etc... If the user doesn't truly understand what they're doing (including the formulation of the CDB), then they're going to panic. camcontrol and atacontrol could both be modified to do the heavy lifting, making similar options/arguments that would mimic hdparm in operation. This would greatly diminish the risks, but the *EXACT PROCEDURE* would need to be explained in the man page. But keep reading for why that may not be enough. I've been in the situation where I've gone through the procedure you followed on said web page, only to run into a quirk with the ATA/IDE subsystem on Windows XP, requiring a power-cycle of the system. The secure erase finished, but I was panicking when I saw the drive spitting out I/O errors on every LBA. I realised that I needed to unlock the drive using --security-unlock then disable security by using --security-disable. Once I did that it was fine. The web page omits that part, in the case of emergency or anomalies are witnessed. This ordeal happened to me today, no joke, while tinkering with my new Intel 510 SSD. So here's a better page: http://tinyapps.org/docs/wipe_drives_hdparm.html Why am I pointing this out? Because, in effect, an entire "HOW TO DO THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be added to camcontrol/atacontrol to ensure people don't end up with "bricked" drives and blame FreeBSD. Trust me, it will happen. Give users tools to shoot themselves in the foot and they will do so. Furthermore, SCSI drives (which is what camcontrol has historically been for up until recently) have a completely different secure erase CDB command for them. ATA has SECURITY ERASE UNIT, SCSI has SECURITY INITIALIZE -- and in the SCSI realm, this feature is optional! So there's that error-prone issue as well. Do you know how many times I've issued "camcontrol inquiry" instead of "camcontrol identify" on my ada(4)-based systems? Too many. Food for thought. :-) Anyway, this is probably the only time you will ever find me saying this, but: if improving camcontrol/atacontrol to accomplish the above is what you want, patches are welcome. I could try to spend some time on this if there is great interest in the community for such (I'm more familiar with atacontrol's code given my SMART work in the past), and I do have an unused Intel 320-series SSD which I can test with. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 11:32:54 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B10E8106564A for ; Thu, 28 Jul 2011 11:32:54 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 7179A8FC16 for ; Thu, 28 Jul 2011 11:32:54 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QmOph-00044i-Dh for freebsd-fs@freebsd.org; Thu, 28 Jul 2011 13:32:53 +0200 Received: from 52-212.dsl.iskon.hr ([89.164.52.212]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 28 Jul 2011 13:32:53 +0200 Received: from ivoras by 52-212.dsl.iskon.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 28 Jul 2011 13:32:53 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Thu, 28 Jul 2011 13:32:41 +0200 Lines: 11 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 52-212.dsl.iskon.hr User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20110624 Thunderbird/5.0 Subject: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 11:32:54 -0000 Grepping for "zil" in sysctls doesn't give anything useful: # sysctl -a | grep zil vfs.zfs.zil_replay_disable: 0 (its description is "Disable intent logging replay" so it looks like a crash recovery option) ... so is there a way to find out if ZIL is enabled? I can look at kenv but for some reason I can't trust its value right now. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 12:24:03 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 774011065672; Thu, 28 Jul 2011 12:24:03 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7BB238FC0C; Thu, 28 Jul 2011 12:24:02 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA22839; Thu, 28 Jul 2011 15:24:01 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E3154E0.1030206@FreeBSD.org> Date: Thu, 28 Jul 2011 15:24:00 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Ivan Voras References: In-Reply-To: X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 12:24:03 -0000 on 28/07/2011 14:32 Ivan Voras said the following: > Grepping for "zil" in sysctls doesn't give anything useful: > > # sysctl -a | grep zil > vfs.zfs.zil_replay_disable: 0 > > (its description is "Disable intent logging replay" so it looks like a > crash recovery option) > > ... so is there a way to find out if ZIL is enabled? > > I can look at kenv but for some reason I can't trust its value right now. Here is a hammer: kgdb. But perhaps there is a more suitable tool :) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 12:49:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E89C6106564A; Thu, 28 Jul 2011 12:49:28 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8495F8FC14; Thu, 28 Jul 2011 12:49:28 +0000 (UTC) Received: by yic13 with SMTP id 13so2251290yic.13 for ; Thu, 28 Jul 2011 05:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=M5WN/R5T0gaVem4blEAcqF4ORKMuqRPlNcvPr8LJWvM=; b=V2zAZL77MR3xgfYEmOENxwUCVLmg3Q3Oxljs8sXmN86VVIDNsIXJSfxwqOo9brc+eO wCVRRiuTlKeEnWlqWgBvnRrpAn9Fl2GnUqNjsdC3RzUmkea5ZMzmMZ2cfKhQo3/A5Qml /sl67+ociHu/EWz05P7W8BRyA3kylNeDfTlNw= Received: by 10.100.233.21 with SMTP id f21mr734003anh.83.1311857367877; Thu, 28 Jul 2011 05:49:27 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 05:48:47 -0700 (PDT) In-Reply-To: <4E3154E0.1030206@FreeBSD.org> References: <4E3154E0.1030206@FreeBSD.org> From: Ivan Voras Date: Thu, 28 Jul 2011 14:48:47 +0200 X-Google-Sender-Auth: pOKrl9paXayALItLrjGF-XMzFjQ Message-ID: To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 12:49:29 -0000 On 28 July 2011 14:24, Andriy Gapon wrote: > on 28/07/2011 14:32 Ivan Voras said the following: >> Grepping for "zil" in sysctls doesn't give anything useful: >> >> # sysctl -a | grep zil >> vfs.zfs.zil_replay_disable: 0 >> >> (its description is "Disable intent logging replay" so it looks like a >> crash recovery option) >> >> ... so is there a way to find out if ZIL is enabled? >> >> I can look at kenv but for some reason I can't trust its value right now. > > Here is a hammer: kgdb. > But perhaps there is a more suitable tool :) Hmmm, no, it looks like the zil_disable code is missing in the latest 8-stable! This confirmes what I noticed in operation and why I didn't trust kenv. >From the various csup dates I have on the servers it looks like it's been removed somewhere between April and now, possibly with ZFS 28 MFC? I.e. this code is missing: *:/sys> grep -rn zil_disable * cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h:382:extern int zil_disable; cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:897: if (zil_disable) { cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:69:int zil_disable = 0; /* disable intent logging */ cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:71:TUNABLE_INT("vfs.zfs.zil_disable", &zil_disable); cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:72:SYSCTL_INT(_vfs_zfs, OID_AUTO, zil_disable, CTLFLAG_RW, &zil_disable, 0, cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:450: if (bp->bio_cmd == BIO_FLUSH && !zil_disable) Any ideas? From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 13:00:43 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77762106566B; Thu, 28 Jul 2011 13:00:43 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 601D98FC15; Thu, 28 Jul 2011 13:00:42 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA23637; Thu, 28 Jul 2011 16:00:40 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E315D78.90209@FreeBSD.org> Date: Thu, 28 Jul 2011 16:00:40 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Ivan Voras References: <4E3154E0.1030206@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Martin Matuska Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 13:00:43 -0000 on 28/07/2011 15:48 Ivan Voras said the following: > On 28 July 2011 14:24, Andriy Gapon wrote: >> on 28/07/2011 14:32 Ivan Voras said the following: >>> Grepping for "zil" in sysctls doesn't give anything useful: >>> >>> # sysctl -a | grep zil >>> vfs.zfs.zil_replay_disable: 0 >>> >>> (its description is "Disable intent logging replay" so it looks like a >>> crash recovery option) >>> >>> ... so is there a way to find out if ZIL is enabled? >>> >>> I can look at kenv but for some reason I can't trust its value right now. >> >> Here is a hammer: kgdb. >> But perhaps there is a more suitable tool :) > > Hmmm, no, it looks like the zil_disable code is missing in the latest > 8-stable! This confirmes what I noticed in operation and why I didn't > trust kenv. > >>From the various csup dates I have on the servers it looks like it's > been removed somewhere between April and now, possibly with ZFS 28 > MFC? http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html > I.e. this code is missing: > > *:/sys> grep -rn zil_disable * > cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h:382:extern int zil_disable; > cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:897: if > (zil_disable) { > cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:69:int zil_disable = > 0; /* disable intent logging */ > cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:71:TUNABLE_INT("vfs.zfs.zil_disable", > &zil_disable); > cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:72:SYSCTL_INT(_vfs_zfs, > OID_AUTO, zil_disable, CTLFLAG_RW, &zil_disable, 0, > cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c:450: if > (bp->bio_cmd == BIO_FLUSH && !zil_disable) > > Any ideas? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 13:22:19 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D8D2106564A for ; Thu, 28 Jul 2011 13:22:19 +0000 (UTC) (envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id DE0218FC12 for ; Thu, 28 Jul 2011 13:22:18 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 14:21:46 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 14:21:46 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014357285.msg for ; Thu, 28 Jul 2011 14:21:44 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: From: "Steven Hartland" To: "Jeremy Chadwick" References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> <20110728012437.GA23430@icarus.home.lan> <20110728103234.GA33275@icarus.home.lan> Date: Thu, 28 Jul 2011 14:22:21 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 13:22:19 -0000 ----- Original Message ----- From: "Jeremy Chadwick" > Well, on FreeBSD /dev/urandom is a symlink to /dev/random. I've > discussed in the past why I use /dev/urandom instead of /dev/random (I > happen to work in a heterogeneous OS environment at work, where urandom > and random are different things). > > I was mainly curious why you were using if=/some/actual/file rather than > if=/dev/urandom directly. 'tis okay, not of much importance. /dev/urandom seems to bottle neck at ~60MB/s a cached file generated from it doesn't e.g. dd if=/dev/random of=/dev/null bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 16.152686 secs (64916509 bytes/sec) dd if=/dev/random of=/data/test bs=1m count=1000 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 16.178811 secs (64811685 bytes/sec) dd if=/data/test of=/dev/null bs=1m 1000+0 records in 1000+0 records out 1048576000 bytes transferred in 0.240348 secs (4362738865 bytes/sec) > Okay, so it sounds like what happened -- if I understand correctly -- is > that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of > data copied to it. It still had 60% free space available. After, the > SSD performance for writes really plummeted (~20MByte/sec), but reads > were still decent. Performing an actual ATA-level secure erase brought > the drive back to normal write performance (~190MByte/sec). Yes this is correct. > If all of that is correct, then I would say the issue is that the > internal GC on the Corsair SSD in question sucks. With 60% of the drive > still available, performance should not have dropped to such an abysmal > rate; the FTL and wear levelling should have, ideally, dealt with this > just fine. But it didn't. Agreed > Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever, > that's an engineering discussion for elsewhere) lacks TRIM. The > underlying filesystem is therefore unable to tell the drive "hey, these > LBAs aren't used any more, you can consider them free and perform a NAND > page erase when an entire NAND page is unused". The FTL has to track > all LBAs you've written to, otherwise if erasing a NAND page which still > had used data in it (for the filesystem) it would result in loss of > data. > > So in summary I'm not too surprised by this situation happening, but I > *AM* surprised at just how horrible writes became for you. The white > paper I linked you goes over this to some degree -- it talks about how > everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks > or talks about how horrible they perform when very little free space is > available, or if the GC is badly implemented. Maybe Corsair's GC is > badly implemented -- I don't know. Agreed again, we've seen a few disks now drop to this level of performance at first we thought the disk was failing, as the newfs -E didn't fix it when the man page indicates it should. But seems thats explained now, only works if its ada not da, and also not quite as good as a secure erase. > I would see if there are any F/W updates for that model of drive. The > firmware controls the GC model/method. Otherwise, if this issue is > reproducible, I'll add this model of Corsair SSD to my list of drives to > avoid. Its the latest firmware version, already checked that. The performance has been good till now and I suspect it could be a generic sandforce thing if its a firmware issue. > Is it possible to accomplish Secure Erase via "camcontrol cmd" with > ada(4)? Yes, but the procedure will be extremely painful, drawn out, > and very error-prone. > > Given that you've followed the procedure on the Linux hdparm/ATA Secure > Erase web page, you're aware of the security and "locked" status one has > to deal with using password-protection to accomplish the erase. hdparm > makes this easy because it's just a bunch of command-line flags; the > ""heavy lifting"" on the ATA layer is done elsewhere. With "camcontrol > cmd", you get to submit the raw ATA CDB yourself, multiple times, at > different phases. Just how familiar with the ATA protocol are you? :-) > > Why I sound paranoid: a typo could potentially "brick" your drive. If > you issue a set-password on the drive, ***ALL*** LBA accesses (read and > write) return I/O errors from that point forward. Make a typo in the > password, formulate the CDB wrong, whatever -- suddenly you have a drive > that you can't access or use any more because the password was wrong, > etc... If the user doesn't truly understand what they're doing > (including the formulation of the CDB), then they're going to panic. > > camcontrol and atacontrol could both be modified to do the heavy > lifting, making similar options/arguments that would mimic hdparm in > operation. This would greatly diminish the risks, but the *EXACT > PROCEDURE* would need to be explained in the man page. But keep reading > for why that may not be enough. > > I've been in the situation where I've gone through the procedure you > followed on said web page, only to run into a quirk with the ATA/IDE > subsystem on Windows XP, requiring a power-cycle of the system. The > secure erase finished, but I was panicking when I saw the drive spitting > out I/O errors on every LBA. I realised that I needed to unlock the > drive using --security-unlock then disable security by using > --security-disable. Once I did that it was fine. The web page omits > that part, in the case of emergency or anomalies are witnessed. This > ordeal happened to me today, no joke, while tinkering with my new Intel > 510 SSD. So here's a better page: > > http://tinyapps.org/docs/wipe_drives_hdparm.html > > Why am I pointing this out? Because, in effect, an entire "HOW TO DO > THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be > added to camcontrol/atacontrol to ensure people don't end up with > "bricked" drives and blame FreeBSD. Trust me, it will happen. Give > users tools to shoot themselves in the foot and they will do so. > > Furthermore, SCSI drives (which is what camcontrol has historically been > for up until recently) have a completely different secure erase CDB > command for them. ATA has SECURITY ERASE UNIT, SCSI has SECURITY > INITIALIZE -- and in the SCSI realm, this feature is optional! So > there's that error-prone issue as well. Do you know how many times I've > issued "camcontrol inquiry" instead of "camcontrol identify" on my > ada(4)-based systems? Too many. Food for thought. :-) > > Anyway, this is probably the only time you will ever find me saying > this, but: if improving camcontrol/atacontrol to accomplish the above is > what you want, patches are welcome. I could try to spend some time on > this if there is great interest in the community for such (I'm more > familiar with atacontrol's code given my SMART work in the past), and I > do have an unused Intel 320-series SSD which I can test with. This is of definite of interest here and I suspect to the rest of the community as well. I'm not at all familiar with ATA codes etc so I expect it would take me ages to come up with this. In our case SSD's are a must as HD's don't have the IOPs to deal with our application, we'll just need to manage the write speed drop offs. Performing offline maintenance to have them run at good speed is not ideal but much easier and more acceptable than booting another OS, which would a total PITA as some machines don't have IPMI with virtual media so means remote hands etc. Using a Backup -> Erase -> Restore direct from BSD would hence be my preferred workaround until TRIM support is added, but I guess that could well be some time for ZFS. Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 13:35:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F29E106564A; Thu, 28 Jul 2011 13:35:45 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id E3FA88FC1F; Thu, 28 Jul 2011 13:35:44 +0000 (UTC) Received: by yxl31 with SMTP id 31so1905745yxl.13 for ; Thu, 28 Jul 2011 06:35:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=SLKq5rNuC+cQaVhJeVz8M0zBh2uz2nhytsbarKKU2+U=; b=O9K3gaHQnFYIBZ/oAtzmJkwQM1NLBohx5qFa+XNGyGI3JmV06XSdMAaaeykiiPhpsp NkFR/Z2PHA41C9vZqp3RUAHF9Y+zgluia8xxFez3rCRvW4UYj3qGTbvOJ7gFAGsKCceX 93Rw6aRgQNnN8kWo4hgik2D8oX0v/6LtUunnY= Received: by 10.101.158.19 with SMTP id k19mr27538ano.61.1311860144139; Thu, 28 Jul 2011 06:35:44 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 06:35:04 -0700 (PDT) In-Reply-To: <4E315D78.90209@FreeBSD.org> References: <4E3154E0.1030206@FreeBSD.org> <4E315D78.90209@FreeBSD.org> From: Ivan Voras Date: Thu, 28 Jul 2011 15:35:04 +0200 X-Google-Sender-Auth: UqEDwuNPHgOAGyi48aDrh1UyNfY Message-ID: To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 13:35:45 -0000 On 28 July 2011 15:00, Andriy Gapon wrote: > on 28/07/2011 15:48 Ivan Voras said the following: >>>From the various csup dates I have on the servers it looks like it's >> been removed somewhere between April and now, possibly with ZFS 28 >> MFC? > > http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html > >> I.e. this code is missing: I don't suppose that complaining about the removal of useful code will do any good? Sometimes you consciously need performance more than 100% reliability (and if the old documentation is right, disabling ZIL will not damage the file system itself, just increase the risk of user data loss). From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 14:05:59 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 076261065670; Thu, 28 Jul 2011 14:05:59 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EC9B58FC13; Thu, 28 Jul 2011 14:05:57 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA24612; Thu, 28 Jul 2011 17:05:55 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E316CC3.6070604@FreeBSD.org> Date: Thu, 28 Jul 2011 17:05:55 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:5.0) Gecko/20110705 Thunderbird/5.0 MIME-Version: 1.0 To: Ivan Voras References: <4E3154E0.1030206@FreeBSD.org> <4E315D78.90209@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.2pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Martin Matuska Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 14:05:59 -0000 on 28/07/2011 16:35 Ivan Voras said the following: > On 28 July 2011 15:00, Andriy Gapon wrote: >> on 28/07/2011 15:48 Ivan Voras said the following: > >>> >From the various csup dates I have on the servers it looks like it's >>> been removed somewhere between April and now, possibly with ZFS 28 >>> MFC? >> >> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html >> >>> I.e. this code is missing: > > I don't suppose that complaining about the removal of useful code will > do any good? The question is obviously not directed to me? :-) > Sometimes you consciously need performance more than 100% reliability > (and if the old documentation is right, disabling ZIL will not damage > the file system itself, just increase the risk of user data loss). -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 14:24:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7784E106564A; Thu, 28 Jul 2011 14:24:36 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-gw0-f50.google.com (mail-gw0-f50.google.com [74.125.83.50]) by mx1.freebsd.org (Postfix) with ESMTP id EF8B88FC18; Thu, 28 Jul 2011 14:24:35 +0000 (UTC) Received: by gwj16 with SMTP id 16so2392149gwj.37 for ; Thu, 28 Jul 2011 07:24:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=SUCGkfXFsAIQUWTZsq5RiA9wKDKbx1WitagPJxygiIE=; b=Y0GncMH68hsLX8FueBw/jkPLIgnG9YGqu/FfrZjHmHL6UvvqyiJdNVbra/6jIpmaWM iU3wqToF/snNUIpVWj6H3GKIxQ5GlFL8BMpKK8gL95ysYv+rzzzcpHn2bsKPqj2HhY+4 4iVru7BQjlS2ySCk1cltuYEvp5Y+fgO+cbOOc= Received: by 10.101.18.6 with SMTP id v6mr82499ani.39.1311863075153; Thu, 28 Jul 2011 07:24:35 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 07:23:55 -0700 (PDT) In-Reply-To: <4E316CC3.6070604@FreeBSD.org> References: <4E3154E0.1030206@FreeBSD.org> <4E315D78.90209@FreeBSD.org> <4E316CC3.6070604@FreeBSD.org> From: Ivan Voras Date: Thu, 28 Jul 2011 16:23:55 +0200 X-Google-Sender-Auth: B7Cx1M7zYSik2x_GwMzJMiNJrS8 Message-ID: To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 14:24:36 -0000 On 28 July 2011 16:05, Andriy Gapon wrote: > on 28/07/2011 16:35 Ivan Voras said the following: >> On 28 July 2011 15:00, Andriy Gapon wrote: >>> on 28/07/2011 15:48 Ivan Voras said the following: >> >>>> >From the various csup dates I have on the servers it looks like it's >>>> been removed somewhere between April and now, possibly with ZFS 28 >>>> MFC? >>> >>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html >>> >>>> I.e. this code is missing: >> >> I don't suppose that complaining about the removal of useful code will >> do any good? > > The question is obviously not directed to me? :-) No, it was a question to the ZFS cabal :) We'll see if it remains rhetorical :) From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 14:59:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A4ED106566B for ; Thu, 28 Jul 2011 14:59:26 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta14.emeryville.ca.mail.comcast.net (qmta14.emeryville.ca.mail.comcast.net [76.96.27.212]) by mx1.freebsd.org (Postfix) with ESMTP id 0F40A8FC0A for ; Thu, 28 Jul 2011 14:59:25 +0000 (UTC) Received: from omta05.emeryville.ca.mail.comcast.net ([76.96.30.43]) by qmta14.emeryville.ca.mail.comcast.net with comcast id DSxZ1h0090vp7WLAESzNfo; Thu, 28 Jul 2011 14:59:22 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta05.emeryville.ca.mail.comcast.net with comcast id DSzW1h00E1t3BNj8RSzYn2; Thu, 28 Jul 2011 14:59:34 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 198AA102C36; Thu, 28 Jul 2011 07:59:17 -0700 (PDT) Date: Thu, 28 Jul 2011 07:59:17 -0700 From: Jeremy Chadwick To: Steven Hartland Message-ID: <20110728145917.GA37805@icarus.home.lan> References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> <20110728012437.GA23430@icarus.home.lan> <20110728103234.GA33275@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 14:59:26 -0000 On Thu, Jul 28, 2011 at 02:22:21PM +0100, Steven Hartland wrote: > ----- Original Message ----- From: "Jeremy Chadwick" > > >Well, on FreeBSD /dev/urandom is a symlink to /dev/random. I've > >discussed in the past why I use /dev/urandom instead of /dev/random (I > >happen to work in a heterogeneous OS environment at work, where urandom > >and random are different things). > > > >I was mainly curious why you were using if=/some/actual/file rather than > >if=/dev/urandom directly. 'tis okay, not of much importance. > > /dev/urandom seems to bottle neck at ~60MB/s a cached file generated from > it doesn't e.g. > dd if=/dev/random of=/dev/null bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 16.152686 secs (64916509 bytes/sec) > > dd if=/dev/random of=/data/test bs=1m count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 16.178811 secs (64811685 bytes/sec) > > dd if=/data/test of=/dev/null bs=1m > 1000+0 records in > 1000+0 records out > 1048576000 bytes transferred in 0.240348 secs (4362738865 bytes/sec) /dev/urandom is highly CPU-bound. For example, on my home box it tops out at about 79MBytes/sec. I tend to use /dev/zero for I/O testing, since I really don't need the CPU tied up generating random data from entropy sources. The difference in speed is dramatic. So yes, I guess if you wanted to test high write speeds with purely randomised data as your source, creating a temporary file with content from /dev/urandom first would be your best bet. (Assuming, of course, that the source you plan to read from can transfer as fast as the writes to the destination, but that goes without saying). > >Okay, so it sounds like what happened -- if I understand correctly -- is > >that your ZFS-based Corsair SSD volume (/ssd) recently had a bunch of > >data copied to it. It still had 60% free space available. After, the > >SSD performance for writes really plummeted (~20MByte/sec), but reads > >were still decent. Performing an actual ATA-level secure erase brought > >the drive back to normal write performance (~190MByte/sec). > > Yes this is correct. > > >If all of that is correct, then I would say the issue is that the > >internal GC on the Corsair SSD in question sucks. With 60% of the drive > >still available, performance should not have dropped to such an abysmal > >rate; the FTL and wear levelling should have, ideally, dealt with this > >just fine. But it didn't. > > Agreed > > >Why I'm focusing on the GC aspect: because ZFS (or GEOM; whatever, > >that's an engineering discussion for elsewhere) lacks TRIM. The > >underlying filesystem is therefore unable to tell the drive "hey, these > >LBAs aren't used any more, you can consider them free and perform a NAND > >page erase when an entire NAND page is unused". The FTL has to track > >all LBAs you've written to, otherwise if erasing a NAND page which still > >had used data in it (for the filesystem) it would result in loss of > >data. > > > >So in summary I'm not too surprised by this situation happening, but I > >*AM* surprised at just how horrible writes became for you. The white > >paper I linked you goes over this to some degree -- it talks about how > >everyone thinks SSDs are "so amazingly fast" yet nobody does benchmarks > >or talks about how horrible they perform when very little free space is > >available, or if the GC is badly implemented. Maybe Corsair's GC is > >badly implemented -- I don't know. > > Agreed again, we've seen a few disks now drop to this level of performance > at first we thought the disk was failing, as the newfs -E didn't fix it when > the man page indicates it should. But seems thats explained now, only > works if its ada not da, and also not quite as good as a secure erase. I guess the newfs(8) man page should be rephrased then. When I read the description for the -E option, I see this paragraph: Erasing may take a long time as it writes to every sector on the disk. And immediately think "Oh, all it does is write zeros to every LBA, probably in blocks of some size that's unknown to me (vs. 512 bytes)". I can submit a PR + patch for this, but I'd propose the man page description for -E be changed to this: -E Erase the content of the disk before making the filesystem. The reserved area in front of the superblock (for bootcode) will not be erased. This option writes zeros to every sector (LBA) on the disk, in transfer sizes of, at most, 65536 * sectorsize bytes. Basically remove the mention of wear-leveling and "intended for use with flash devices". Any device can use this option as well; it's a UFS-esque equivalent of dd if=/dev/zero of=/dev/device bs=..., sans the exclusions mentioned. The tricky part is the "transfer sizes of, at most..." line. I'm certain someone will ask me where I got that from, so I'll explain it. Sorry for the long-winded stuff, but this is more or less how I learn, and I hope it benefits someone in the process. And man I sure hope I'm reading this code right... Down the rabbit hole we go: newfs(8) calls berase(3), which is part of libufs: 501 if (Eflag && !Nflag) { ... 505 berase(&disk, sblock.fs_sblockloc / disk.d_bsize, 506 sblock.fs_size * sblock.fs_fsize - sblock.fs_sblockloc); The man page for berase(3) doesn't tell you the size of I/O transfer (the "block size") when it asks the kernel to effectively write zeros to the device. Looking at src/lib/libufs/block.c, we find this: 143 berase(struct uufsd *disk, ufs2_daddr_t blockno, ufs2_daddr_t size) ... 154 ioarg[0] = blockno * disk->d_bsize; 155 ioarg[1] = size; 156 rv = ioctl(disk->d_fd, DIOCGDELETE, ioarg); This ioctl(2) (DIOCGDELETE) is not documented anywhere in the entire source code tree (grep -r DIOCGDELETE /usr/src returns absolutely no documentation references). Furthermore, at this point we still have no idea whow the arguments being passed to ioctl are used; is "size" the total size, or is it the transfer size of the write we're going to issue? DIOCGDELETE is handled in src/sys/geom/geom_dev.c, where we finally get some answers: 293 case DIOCGDELETE: 294 offset = ((off_t *)data)[0]; 295 length = ((off_t *)data)[1]; ... 303 while (length > 0) { 304 chunk = length; 305 if (chunk > 65536 * cp->provider->sectorsize) 306 chunk = 65536 * cp->provider->sectorsize; 307 error = g_delete_data(cp, offset, chunk); 308 length -= chunk; 309 offset += chunk; So ioctl[0] is the offset, and ioctl[1] represents the actual TOTAL SIZE of what we want erased, NOT the transfer block size itself. The block size itself is calculated on line 306, so 65536 * the actual GEOM provider's "advertised sector size". On SSDs, this would be 512 bytes (no I am not kidding). But we're still not finished. What is g_delete_data? It's an internal GEOM function which does what it's told (heh :-) ). src/sys/geom/geom_io.c sheds light on that: 739 g_delete_data(struct g_consumer *cp, off_t offset, off_t length) 740 { 741 struct bio *bp; 742 int error; 743 744 KASSERT(length > 0 && length >= cp->provider->sectorsize, 745 ("g_delete_data(): invalid length %jd", (intmax_t)length)); 746 747 bp = g_alloc_bio(); 748 bp->bio_cmd = BIO_DELETE; 749 bp->bio_done = NULL; 750 bp->bio_offset = offset; 751 bp->bio_length = length; 752 bp->bio_data = NULL; 753 g_io_request(bp, cp); 754 error = biowait(bp, "gdelete"); ... Okay, so without going into g_io_request() (did I not say something about rabbit holes earlier?), we can safely assume that's even more abstraction around a BIO_DELETE call. bp->bio_length is the size of the data to tinker with, in bytes. So in summary, with a 512-byte "advertised sector" disk, the erase would happen in 32MByte "transfer size blocks". Let's test that theory with an mdconfig(8) "disk" and a slightly modified version of newfs(8) that tells us what the value of the 3rd argument is that it's passing to berase(3): # mdconfig -a -t malloc -s 256m -o reserve -u 0 md0 # sysctl -b kern.geom.conftxt | strings | grep md0 0 MD md0 268435456 512 u 0 s 512 f 0 fs 0 l 268435456 t malloc Sector size of the md0 pseudo-disk is 512 bytes (5th parameter). Now the modified newfs: # ~jdc/tmp/newfs/newfs -E /dev/md0 /dev/md0: 256.0MB (524288 sectors) block size 16384, fragment size 2048 using 4 cylinder groups of 64.02MB, 4097 blks, 8256 inodes. Erasing sectors [128...524287] berase() 3rd arg: 268369920 super-block backups (for fsck -b #) at: 160, 131264, 262368, 393472 There's the printf() I added ("berase()..."). So the argument passed to berase() is 268369920 (the size of the pseudo-disk, sans the area before the superblock, in this case 4 CGs at 16384 block size, so 65536 bytes; 268435456 - 268369920 == 65536). Now back to the geom_dev.c code with the data we know: - Line 395 would assign length to 268369920 - Line 304 would assign chunk to 268369920 - Line 305 conditional would prove true; 268369920 > 33554432 (65536*512), so chunk becomes 33554432 - Line 307 "and within" does the actual zeroing The reason the man page can't say 32MBytes explicitly is because it's dynamic (based on sector size). I imagine, somewhere down the road, we WILL have disks that start advertising non-512-byte sector sizes. As of this writing none I have seen do (SSDs nor WD -EARS drives). > >I would see if there are any F/W updates for that model of drive. The > >firmware controls the GC model/method. Otherwise, if this issue is > >reproducible, I'll add this model of Corsair SSD to my list of drives to > >avoid. > > Its the latest firmware version, already checked that. The performance > has been good till now and I suspect it could be a generic sandforce > thing if its a firmware issue. SandForce-based SSDs have a history of being extremely good with their GC, but I've never used one. However, if I remember right (something I read not more than a week ago, I just can't remember where!), it's very rare that any SF-based SSD vendor uses the stock SF firmware. They modify the hell out of it. Meaning: two SSDs using the exact same model of SF controller doesn't mean they'll behave the exact same. Hmm, I probably read this on some SSD review site, maybe Anandtech. I imagine the same applies to Marvell-based SSD controllers too. > >Is it possible to accomplish Secure Erase via "camcontrol cmd" with > >ada(4)? Yes, but the procedure will be extremely painful, drawn out, > >and very error-prone. > > > >Given that you've followed the procedure on the Linux hdparm/ATA Secure > >Erase web page, you're aware of the security and "locked" status one has > >to deal with using password-protection to accomplish the erase. hdparm > >makes this easy because it's just a bunch of command-line flags; the > >""heavy lifting"" on the ATA layer is done elsewhere. With "camcontrol > >cmd", you get to submit the raw ATA CDB yourself, multiple times, at > >different phases. Just how familiar with the ATA protocol are you? :-) > > > >Why I sound paranoid: a typo could potentially "brick" your drive. If > >you issue a set-password on the drive, ***ALL*** LBA accesses (read and > >write) return I/O errors from that point forward. Make a typo in the > >password, formulate the CDB wrong, whatever -- suddenly you have a drive > >that you can't access or use any more because the password was wrong, > >etc... If the user doesn't truly understand what they're doing > >(including the formulation of the CDB), then they're going to panic. > > > >camcontrol and atacontrol could both be modified to do the heavy > >lifting, making similar options/arguments that would mimic hdparm in > >operation. This would greatly diminish the risks, but the *EXACT > >PROCEDURE* would need to be explained in the man page. But keep reading > >for why that may not be enough. > > > >I've been in the situation where I've gone through the procedure you > >followed on said web page, only to run into a quirk with the ATA/IDE > >subsystem on Windows XP, requiring a power-cycle of the system. The > >secure erase finished, but I was panicking when I saw the drive spitting > >out I/O errors on every LBA. I realised that I needed to unlock the > >drive using --security-unlock then disable security by using > >--security-disable. Once I did that it was fine. The web page omits > >that part, in the case of emergency or anomalies are witnessed. This > >ordeal happened to me today, no joke, while tinkering with my new Intel > >510 SSD. So here's a better page: > > > >http://tinyapps.org/docs/wipe_drives_hdparm.html > > > >Why am I pointing this out? Because, in effect, an entire "HOW TO DO > >THIS AND WHAT TO DO IF IT GOES HORRIBLY WRONG" section would need to be > >added to camcontrol/atacontrol to ensure people don't end up with > >"bricked" drives and blame FreeBSD. Trust me, it will happen. Give > >users tools to shoot themselves in the foot and they will do so. > > > >Furthermore, SCSI drives (which is what camcontrol has historically been > >for up until recently) have a completely different secure erase CDB > >command for them. ATA has SECURITY ERASE UNIT, SCSI has SECURITY > >INITIALIZE -- and in the SCSI realm, this feature is optional! So > >there's that error-prone issue as well. Do you know how many times I've > >issued "camcontrol inquiry" instead of "camcontrol identify" on my > >ada(4)-based systems? Too many. Food for thought. :-) > > > >Anyway, this is probably the only time you will ever find me saying > >this, but: if improving camcontrol/atacontrol to accomplish the above is > >what you want, patches are welcome. I could try to spend some time on > >this if there is great interest in the community for such (I'm more > >familiar with atacontrol's code given my SMART work in the past), and I > >do have an unused Intel 320-series SSD which I can test with. > > This is of definite of interest here and I suspect to the rest of the > community as well. I'm not at all familiar with ATA codes etc so I > expect it would take me ages to come up with this. > > In our case SSD's are a must as HD's don't have the IOPs to deal with > our application, we'll just need to manage the write speed drop offs. > > Performing offline maintenance to have them run at good speed is > not ideal but much easier and more acceptable than booting another OS, > which would a total PITA as some machines don't have IPMI with virtual > media so means remote hands etc. > > Using a Backup -> Erase -> Restore direct from BSD would hence be my > preferred workaround until TRIM support is added, but I guess that could > well be some time for ZFS. Understood. I'm off work this week so I'll see if I can dedicate some time to it. Too many non-work projects I'm juggling right now, argh. I'll have to start with camcontrol since the test system I have uses ada(4) and not classic ata(4). I'm not even sure what I'm really in for given that I've never looked at camcontrol's code before. If I "brick" my SSD I'll send you a bill, Steven. Kidding. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 15:27:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 93FC5106566B for ; Thu, 28 Jul 2011 15:27:11 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id 7950E8FC15 for ; Thu, 28 Jul 2011 15:27:11 +0000 (UTC) Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76]) by qmta09.emeryville.ca.mail.comcast.net with comcast id DTSW1h0011eYJf8A9TT72B; Thu, 28 Jul 2011 15:27:08 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta19.emeryville.ca.mail.comcast.net with comcast id DTNR1h01R1t3BNj01TNde6; Thu, 28 Jul 2011 15:22:41 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id BE4CE102C36; Thu, 28 Jul 2011 08:21:51 -0700 (PDT) Date: Thu, 28 Jul 2011 08:21:51 -0700 From: Jeremy Chadwick To: Ivan Voras Message-ID: <20110728152151.GA39317@icarus.home.lan> References: <4E3154E0.1030206@FreeBSD.org> <4E315D78.90209@FreeBSD.org> <4E316CC3.6070604@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 15:27:11 -0000 On Thu, Jul 28, 2011 at 04:23:55PM +0200, Ivan Voras wrote: > On 28 July 2011 16:05, Andriy Gapon wrote: > > on 28/07/2011 16:35 Ivan Voras said the following: > >> On 28 July 2011 15:00, Andriy Gapon wrote: > >>> on 28/07/2011 15:48 Ivan Voras said the following: > >> > >>>> >From the various csup dates I have on the servers it looks like it's > >>>> been removed somewhere between April and now, possibly with ZFS 28 > >>>> MFC? > >>> > >>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html > >>> > >>>> I.e. this code is missing: > >> > >> I don't suppose that complaining about the removal of useful code will > >> do any good? > > > > The question is obviously not directed to me? :-) > > No, it was a question to the ZFS cabal :) We'll see if it remains rhetorical :) What about this? http://blog.tschokko.de/archives/786 http://milek.blogspot.com/2010/05/zfs-synchronous-vs-asynchronous-io.html # zfs get all backups | grep sync backups sync standard default -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 15:47:15 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF8F7106564A for ; Thu, 28 Jul 2011 15:47:15 +0000 (UTC) (envelope-from prvs=1190a6d8e6=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 30B6B8FC0C for ; Thu, 28 Jul 2011 15:47:14 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 16:46:42 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Thu, 28 Jul 2011 16:46:42 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014359165.msg for ; Thu, 28 Jul 2011 16:46:40 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1190a6d8e6=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: <2A07CD8AE6AE49A5BAED59A7E547D1F9@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" References: <13BEC27B17D24D0CBF2E6A98FD3227F3@multiplay.co.uk> <20110728012437.GA23430@icarus.home.lan> <20110728103234.GA33275@icarus.home.lan> <20110728145917.GA37805@icarus.home.lan> Date: Thu, 28 Jul 2011 16:47:21 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance under FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 15:47:15 -0000 ----- Original Message ----- From: "Jeremy Chadwick" > I guess the newfs(8) man page should be rephrased then. When I read the > description for the -E option, I see this paragraph: > > Erasing may take a long time as it writes to every sector > on the disk. > > And immediately think "Oh, all it does is write zeros to every LBA, > probably in blocks of some size that's unknown to me (vs. 512 bytes)". > > I can submit a PR + patch for this, but I'd propose the man page > description for -E be changed to this: > > -E Erase the content of the disk before making the filesystem. > The reserved area in front of the superblock (for bootcode) > will not be erased. > > This option writes zeros to every sector (LBA) on the disk, > in transfer sizes of, at most, 65536 * sectorsize bytes. It actually does more than this using BIO_DELETE to tell the disk its unallocated now aka (TRIM) but needs to state its only suppored on some controllers / disk drivers. > Basically remove the mention of wear-leveling and "intended for use > with flash devices". Any device can use this option as well; it's a > UFS-esque equivalent of dd if=/dev/zero of=/dev/device bs=..., sans the > exclusions mentioned. I believe it does this if its supported, which atm means ada, thats what needs clarifying. > SandForce-based SSDs have a history of being extremely good with their > GC, but I've never used one. However, if I remember right (something I > read not more than a week ago, I just can't remember where!), it's very > rare that any SF-based SSD vendor uses the stock SF firmware. They > modify the hell out of it. Meaning: two SSDs using the exact same model > of SF controller doesn't mean they'll behave the exact same. Hmm, I > probably read this on some SSD review site, maybe Anandtech. I imagine > the same applies to Marvell-based SSD controllers too. Yer quite possibly. >> Using a Backup -> Erase -> Restore direct from BSD would hence be my >> preferred workaround until TRIM support is added, but I guess that could >> well be some time for ZFS. > > Understood. I'm off work this week so I'll see if I can dedicate some > time to it. Too many non-work projects I'm juggling right now, argh. > > I'll have to start with camcontrol since the test system I have uses > ada(4) and not classic ata(4). I'm not even sure what I'm really in for > given that I've never looked at camcontrol's code before. > > If I "brick" my SSD I'll send you a bill, Steven. Kidding. :-) If you need a test SSD lmk an address offlist and I'll sort, least we can do :) Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 28 15:52:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BD9E1065674; Thu, 28 Jul 2011 15:52:49 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 34A1D8FC12; Thu, 28 Jul 2011 15:52:48 +0000 (UTC) Received: by gxk28 with SMTP id 28so2441043gxk.13 for ; Thu, 28 Jul 2011 08:52:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=khQM03DMmVfAoxmvtLo4vJNYDXW7h2379wCfzqJiGMc=; b=IXbGPVsSoD0cl7MMKF0W7QZzQ8Y0TbYPP19ucSV5uncJ/TTWKTPqNoxwyv6ZC+ZJsY 67cTu6maBh5EkVS/plWqA+UDNC7+We24sWx5lpNGSRVZp2s+QAuFQw2yPmlUYkonqIkk 1qSsZKsvFZ7+RbC6G8cLztfAm2DJSPFO4dcjM= Received: by 10.100.211.11 with SMTP id j11mr174419ang.17.1311868368147; Thu, 28 Jul 2011 08:52:48 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.100.198.5 with HTTP; Thu, 28 Jul 2011 08:52:08 -0700 (PDT) In-Reply-To: <20110728152151.GA39317@icarus.home.lan> References: <4E3154E0.1030206@FreeBSD.org> <4E315D78.90209@FreeBSD.org> <4E316CC3.6070604@FreeBSD.org> <20110728152151.GA39317@icarus.home.lan> From: Ivan Voras Date: Thu, 28 Jul 2011 17:52:08 +0200 X-Google-Sender-Auth: mnnxSNZlP5mQGc5qLK_IbRWy4q8 Message-ID: To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: ZFS how to find out if ZIL is currently enabled? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Jul 2011 15:52:49 -0000 On 28 July 2011 17:21, Jeremy Chadwick wrote: > On Thu, Jul 28, 2011 at 04:23:55PM +0200, Ivan Voras wrote: >> On 28 July 2011 16:05, Andriy Gapon wrote: >> > on 28/07/2011 16:35 Ivan Voras said the following: >> >> On 28 July 2011 15:00, Andriy Gapon wrote: >> >>> on 28/07/2011 15:48 Ivan Voras said the following: >> >> >> >>>> >From the various csup dates I have on the servers it looks like it's >> >>>> been removed somewhere between April and now, possibly with ZFS 28 >> >>>> MFC? >> >>> >> >>> http://www.mail-archive.com/freebsd-stable@freebsd.org/msg114251.html >> >>> >> >>>> I.e. this code is missing: >> >> >> >> I don't suppose that complaining about the removal of useful code will >> >> do any good? >> > >> > The question is obviously not directed to me? :-) >> >> No, it was a question to the ZFS cabal :) We'll see if it remains rhetorical :) > > What about this? > > http://blog.tschokko.de/archives/786 > http://milek.blogspot.com/2010/05/zfs-synchronous-vs-asynchronous-io.html Hey, that's great! I didn't know about it - the sync property is good enough for me! (even better as it is per-fs) I've just enabled it where I need it and I can see the difference. From owner-freebsd-fs@FreeBSD.ORG Fri Jul 29 05:55:05 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 658C2106566C; Fri, 29 Jul 2011 05:55:05 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 35E378FC0C; Fri, 29 Jul 2011 05:55:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p6T5t5Ii093912; Fri, 29 Jul 2011 05:55:05 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p6T5t5tb093908; Fri, 29 Jul 2011 05:55:05 GMT (envelope-from linimon) Date: Fri, 29 Jul 2011 05:55:05 GMT Message-Id: <201107290555.p6T5t5tb093908@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/159251: [zfs] [request]: add FLETCHER4 as DEDUP hash option X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Jul 2011 05:55:05 -0000 Old Synopsis: FEATURE REQUEST: add FLETCHER4 as DEDUP hash option New Synopsis: [zfs] [request]: add FLETCHER4 as DEDUP hash option Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Jul 29 05:54:25 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=159251