From owner-freebsd-fs@FreeBSD.ORG Sun Nov 16 05:23:10 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76A9296D for ; Sun, 16 Nov 2014 05:23:10 +0000 (UTC) Received: from maildrop31.somerville.occnc.com (maildrop31.somerville.occnc.com [IPv6:2001:550:3800:203::3131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 462A4994 for ; Sun, 16 Nov 2014 05:23:10 +0000 (UTC) Received: from harbor31.somerville.occnc.com (harbor31.somerville.occnc.com [IPv6:2001:550:3800:203::3231]) (authenticated bits=128) by maildrop31.somerville.occnc.com (8.14.9/8.14.9) with ESMTP id sAG5MwG7009367; Sun, 16 Nov 2014 00:22:58 -0500 (EST) (envelope-from curtis@ipv6.occnc.com) Message-Id: <201411160522.sAG5MwG7009367@maildrop31.somerville.occnc.com> To: Steven Hartland Reply-To: curtis@ipv6.occnc.com From: Curtis Villamizar Subject: Re: zpool create on md hangs In-reply-to: Your message of "Mon, 10 Nov 2014 09:27:26 +0000." <546084FE.80300@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <9365.1416115378.1@harbor31.somerville.occnc.com> Content-Transfer-Encoding: quoted-printable Date: Sun, 16 Nov 2014 00:22:58 -0500 X-Spam-Status: No, score=-101.5 required=5.0 tests=ALL_TRUSTED,MISSING_MID, RP_MATCHES_RCVD,USER_IN_WHITELIST autolearn=unavailable autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on maildrop31.somerville.occnc.com Cc: "freebsd-fs@freebsd.org" , curtis@ipv6.occnc.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2014 05:23:10 -0000 In message <546084FE.80300@multiplay.co.uk> Steven Hartland writes: = > On 10/11/2014 06:48, Andreas Nilsson wrote: > > On Mon, Nov 10, 2014 at 7:37 AM, Curtis Villamizar > > wrote: > > > >> The following shell program produces a hang. Its reproducible (hangs > >> every time). > >> > >> #!sh > >> > >> set -e > >> set -x > >> > >> truncate -s `expr 10 \* 1024 \* 1024 \* 1024` /image-file > >> md_unit=3D`mdconfig -a -n -t vnode -f /image-file` > >> echo "md device is /dev/md$md_unit" > >> zpool create test md$md_unit > >> > >> The zpool command hangs. Kill or kill -9 has no effect. All > >> filesystems are unaffected but any other zpool or zfs command will > >> hang and be unkillable. A reboot is needed. > >> > >> This is running on: > >> > >> FreeBSD 10.0-STABLE (GENERIC) #0 r270645: Wed Aug 27 00:54:29 EDT= 2014 > >> > >> When I get a chance, I will try again with a 10.1 RC3 kernel I > >> recently built. If this still doesn't work, I'll build an r11 kernel > >> since the code differs from 10.1, not having the svm code merged in. > >> I'm asking before poking around further in case anyone has insights > >> into why this might happen. > >> > >> BTW- The reason to create a zfs filesystem on an vnode type md is to > >> create an image that can run under bhyve using a zfs root fs. This > >> works quite nicely for combinations geom types (gmirror, gstripe, > >> gjournal, gcache) but zpool hangs when trying this with zfs. > >> > >> Curtis > >> > >> ps- please keep me on the Cc as I'm not subscribed to freebsd-fs. > >> > > Freezes here on 10.1-RC2-p1 (amd64) as well. > > ^T says: > > load: 0.21 cmd: zpool 74063 [zio->io_cv] 8.84r 0.00u 0.00s 0% 3368k > > > = > I suspect your just seeing the delay as it trim's the file and it will = > complete in time. > = > Try setting vfs.zfs.vdev.trim_on_init=3D0 before running the create and = > see if it completes quickly after that. > = > I tested this on HEAD and confirmed it was the case there. > = > Regards > Steve Steve, Thanks for the hint. I'm doing some testing so I'm doing this quite a bit but its automated. For a while I was continuing to just let it take 4-10 minutes. The symptoms during that time when the trim is happenning are any zpool or zfs commands hang and don't respond to a kill or even a kill -9. I've also had a few cases where a "shutdown -r now" flushed buffers but wouldn't get to the reboot and had to be powered off plus I had one apparent hang of the entire disk subsystem. I'm currently using FreeBSD 10.1-PRERELEASE #0 r274470. All of these symptons go away with vfs.zfs.vdev.trim_on_init=3D0 so I put it in my sysctl.conf files. Maybe it should be the default given the severity of behavior with vfs.zfs.vdev.trim_on_init=3D1 (too late for 10.1). Comments in the code call it an "optimization". Does anyone know exactly what the trim does? Anything useful or necessary? Curtis [fyi- unrelated] I'm performance testing configurations of disk with a vm under a compile load and using make -j maxjobs with various values of maxjobs. So far under bhyve it takes about 50% longer than native. With native combinations of stripe, mirror, journal, cache, and zfs vs ufs make little difference, about 5%. Within a vm, a disk stripe runs faster than mirror (as expected) and I'm still early in testing but trying doing the mirror or stripe on the host vs on the vm and other permutations. I've been running mirrored disks since about 1994 (with the original vinum, then gvinum, then geom mirror, then zfs mirror) but I've never taken the time to check performance. I see a 15:1 difference with the CPUs I have (old single core inel no longer used vs 4 core atom vs 4 core i3) but so far only 50% penalty for big compiles in a vm vs same processor native. Above -j 4 there is a small performance gain with zfs (which is generally slightly slower) but none for the others. I did a fair amount of testing for native disk. I've only started testing vm disk permutations but in doing this testing I'm learning a lot about bhyve and geom and zfs quirks. From owner-freebsd-fs@FreeBSD.ORG Sun Nov 16 08:10:24 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A493EF41 for ; Sun, 16 Nov 2014 08:10:24 +0000 (UTC) Received: from hapkido.dreamhost.com (hapkido.dreamhost.com [66.33.216.122]) by mx1.freebsd.org (Postfix) with ESMTP id 82CEAA18 for ; Sun, 16 Nov 2014 08:10:24 +0000 (UTC) Received: from homiemail-a7.g.dreamhost.com (homie.mail.dreamhost.com [208.97.132.208]) by hapkido.dreamhost.com (Postfix) with ESMTP id D93288AF15 for ; Sun, 16 Nov 2014 00:01:50 -0800 (PST) Received: from homiemail-a7.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a7.g.dreamhost.com (Postfix) with ESMTP id 114BC25C06A; Sun, 16 Nov 2014 00:01:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=dylanleigh.net; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= dylanleigh.net; bh=uW+wbc1aUylLnHx0Tj3SZ+LpZn4=; b=J4le5rFThcySy 8s+j7Odaq/Tms+0X3cArVhd3ATyIk0FZZFPIniDlYm+UtOyPu5EoXYT2Hl3TzDaV LakteAvryBtOp0N2TFnzFLM8W21eHu4l/O9x6A2iW0kHxDkpBBzLPHSbXzoSO5gV +v4hjM0o/B5dGIXgDt4vl9gBLt4/kY= Received: from exhan.dylanleigh.net (ppp118-209-100-198.lns20.mel4.internode.on.net [118.209.100.198]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: dleigh@htns.net) by homiemail-a7.g.dreamhost.com (Postfix) with ESMTPSA id 4F0B825C062; Sun, 16 Nov 2014 00:01:43 -0800 (PST) Date: Sun, 16 Nov 2014 19:01:29 +1100 From: Dylan Leigh To: Olivier =?iso-8859-1?Q?Cochard-Labb=E9?= Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade Message-ID: <20141116080128.GA20042@exhan.dylanleigh.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: X-Author-WWW: http://www.dylanleigh.net User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2014 08:10:24 -0000 On Sat, Nov 15, 2014 at 07:11:36PM +0100, Olivier Cochard-Labb=E9 wrote: > Hi, >=20 > I've upgraded a small NAS to 10.1 and issue a zpool upgrade. > But now I've got a problem: my volume didn't have free space anymore (i= t > should have about 50GB of free space). >=20 > #zpool list storage > NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH AL= TROOT > storage 9,06T 8,87T 198G 15% - 97% 1.00x ONLINE - > # zfs list storage > NAME USED AVAIL REFER MOUNTPOINT > storage 7,09T 0 404M /storage >=20 > =3D> notice the 0 in "AVAIL" Could you provide some other details about the pool structure/config, including the output of "zpool status"? Cheers, Dylan --=20 Dylan Leigh // VU# s4081906 // www.dylanleigh.net From owner-freebsd-fs@FreeBSD.ORG Sun Nov 16 08:39:05 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4FA5D275 for ; Sun, 16 Nov 2014 08:39:05 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 84440BD8 for ; Sun, 16 Nov 2014 08:39:03 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA14454; Sun, 16 Nov 2014 10:40:55 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1XpvMM-00008p-2G; Sun, 16 Nov 2014 10:39:02 +0200 Message-ID: <5468626E.5090701@FreeBSD.org> Date: Sun, 16 Nov 2014 10:38:06 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: curtis@ipv6.occnc.com, Steven Hartland Subject: Re: zpool create on md hangs References: <546084FE.80300@multiplay.co.uk> <201411160522.sAG5MwG7009367@maildrop31.somerville.occnc.com> In-Reply-To: <201411160522.sAG5MwG7009367@maildrop31.somerville.occnc.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2014 08:39:05 -0000 On 16/11/2014 07:22, Curtis Villamizar wrote: > Does > anyone know exactly what the trim does? Anything useful or necessary? Google certainly does :-) Have you tried asking? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Nov 16 15:10:50 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 839E5506 for ; Sun, 16 Nov 2014 15:10:50 +0000 (UTC) Received: from mail-wg0-x230.google.com (mail-wg0-x230.google.com [IPv6:2a00:1450:400c:c00::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 140F61C3 for ; Sun, 16 Nov 2014 15:10:50 +0000 (UTC) Received: by mail-wg0-f48.google.com with SMTP id y19so737395wgg.21 for ; Sun, 16 Nov 2014 07:10:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=WIJ2KtYlKe22BSgPuih6sw8ynoVKkcrUCSZ8wXifgpc=; b=kifBVoCPXLKEiV0AmjVByexQ0yvfAzymsKzZM/twlywQemOwc0sVc+aFw5vjc0lpuY UTV75vHK8C7lWmu5fXm8h0IRvxsc45qZiNHhKMYiw3VakLhbzytEdDOL+EZUr0aoK/no lFQziClbC1Knc5hzny4dfQCuuHzFh2GowKddH0fsxkHZDihAXJTPF2Jh2ZplvBkiBZch dp2T+zcpnCCjD8rZUe8yIxjHsmqX+7LCZ2kAjOzVCGepEl/Ieqw0wg9kaltjxCd3wqiT 1xi7ilacBOjj0bYpFnSvlS3LJg42lFALYX1Pkv+3Zjnpy+qTeeagqNe8hD24gO7uXbiA tm4A== X-Received: by 10.194.189.81 with SMTP id gg17mr3036874wjc.115.1416150648476; Sun, 16 Nov 2014 07:10:48 -0800 (PST) MIME-Version: 1.0 Sender: cochard@gmail.com Received: by 10.194.137.5 with HTTP; Sun, 16 Nov 2014 07:10:28 -0800 (PST) In-Reply-To: <20141116080128.GA20042@exhan.dylanleigh.net> References: <20141116080128.GA20042@exhan.dylanleigh.net> From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= Date: Sun, 16 Nov 2014 16:10:28 +0100 X-Google-Sender-Auth: vYoCgosgk11Pq9N_9v2vWh0tU38 Message-ID: Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade To: Dylan Leigh Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2014 15:10:50 -0000 On Sun, Nov 16, 2014 at 9:01 AM, Dylan Leigh wrote: > > Could you provide some other details about the pool structure/config, > including the output of "zpool status"? > > It's a raidz1 pool build with 5 SATA 2TB drives, and there are 5 zvolumes without advanced features (no compression, no snapshot, no de-dup, etc...). Because it's a raidz1 pool, I know that FREE space reported by a "zpool list" include redundancy overhead and is bigger than AVAIL space reported by a "zfs list". I've moved about 100GB (on hundred GigaByte) of files and after this step there were only 2GB (two GigaByte) of Free space only: How is it possible ? About the zpool status: pool: storage state: ONLINE scan: scrub in progress since Sun Nov 16 07:58:54 2014 4,28T scanned out of 8,78T at 1/s, (scan is slow, no estimated time) 0 repaired, 48,80% done config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0p1 ONLINE 0 0 0 ada1p1 ONLINE 0 0 0 ada2p1 ONLINE 0 0 0 ada3p1 ONLINE 0 0 0 gptid/7ef9b7e4-fc4c-11e1-b75e-009c029758a0 ONLINE 0 0 0 errors: No known data errors From owner-freebsd-fs@FreeBSD.ORG Sun Nov 16 21:00:06 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EF9A81F6 for ; Sun, 16 Nov 2014 21:00:06 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C4F168E5 for ; Sun, 16 Nov 2014 21:00:06 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAGL06BC023683 for ; Sun, 16 Nov 2014 21:00:06 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201411162100.sAGL06BC023683@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 16 Nov 2014 21:00:06 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2014 21:00:07 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ----------------+-----------+------------------------------------------------- Needs MFC | 136470 | [nfs] Cannot mount / in read-only, over NFS Needs MFC | 139651 | [nfs] mount(8): read-only remount of NFS volume Needs MFC | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non 3 problems total for which you should take action. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 08:00:09 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58541454 for ; Mon, 17 Nov 2014 08:00:09 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 463C6DE6 for ; Mon, 17 Nov 2014 08:00:09 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAH809SB020717 for ; Mon, 17 Nov 2014 08:00:09 GMT (envelope-from bugzilla-noreply@freebsd.org) Message-Id: <201411170800.sAH809SB020717@kenobi.freebsd.org> From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [FreeBSD Bugzilla] Commit Needs MFC MIME-Version: 1.0 X-Bugzilla-Type: whine X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated Date: Mon, 17 Nov 2014 08:00:09 +0000 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 08:00:09 -0000 Hi, You have a bug in the "Needs MFC" state which has not been touched in 7 or more days. This email serves as a reminder that you may want to MFC this bug or marked it as completed. In the event you have a longer MFC timeout you may update this bug with a comment and I won't remind you again for 7 days. This reminder is only sent on Mondays. Please file a bug about concerns you may have. This search was scheduled by eadler@FreeBSD.org. (3 bugs) Bug 136470: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=136470 Severity: Affects Only Me Priority: Normal Hardware: Any Assignee: freebsd-fs@FreeBSD.org Status: Needs MFC Resolution: Summary: [nfs] Cannot mount / in read-only, over NFS Bug 139651: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=139651 Severity: Affects Only Me Priority: Normal Hardware: Any Assignee: freebsd-fs@FreeBSD.org Status: Needs MFC Resolution: Summary: [nfs] mount(8): read-only remount of NFS volume does not work Bug 144447: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=144447 Severity: Affects Only Me Priority: Normal Hardware: Any Assignee: freebsd-fs@FreeBSD.org Status: Needs MFC Resolution: Summary: [zfs] sharenfs fsunshare() & fsshare_main() non functional From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 11:29:48 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 36778D3 for ; Mon, 17 Nov 2014 11:29:48 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1E8ED775 for ; Mon, 17 Nov 2014 11:29:48 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAHBTlKR092975 for ; Mon, 17 Nov 2014 11:29:47 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 191573] [zfs] kernel panic when running zpool/add/files.t Date: Mon, 17 Nov 2014 11:29:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: smh@FreeBSD.org X-Bugzilla-Status: In Discussion X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: see_also Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 11:29:48 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191573 Steven Hartland changed: What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.freebsd.org/bu | |gzilla/show_bug.cgi?id=1950 | |61 -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 11:32:23 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2423F306 for ; Mon, 17 Nov 2014 11:32:23 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0BBCD837 for ; Mon, 17 Nov 2014 11:32:23 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAHBWMhf015520 for ; Mon, 17 Nov 2014 11:32:22 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 191573] [zfs] kernel panic when running zpool/add/files.t Date: Mon, 17 Nov 2014 11:32:23 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: In Discussion X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 11:32:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191573 --- Comment #18 from commit-hook@freebsd.org --- A commit references this bug: Author: smh Date: Mon Nov 17 11:32:12 UTC 2014 New revision: 274619 URL: https://svnweb.freebsd.org/changeset/base/274619 Log: Disable TRIM on file backed ZFS vdevs and fix TRIM on init After r265152 TRIM requests are ZIO_TYPE_FREE instead of ZIO_TYPE_IOCTL this meant file backed vdevs to attempted to process the ZIO as a write causing a panic. We now disable TRIM on file backed vdevs and ASSERT the ZIO types supported by each vdev type to ensure we explicity support the ZIO type being processed. Also ensure that TRIM on init is not procesed for devices which declare they didn't support TRIM via vdev_notrim. PR: 195061, 194976, 191573 Sponsored by: Multiplay Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_disk.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_file.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 11:36:24 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BD6C35FE for ; Mon, 17 Nov 2014 11:36:24 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A5A44865 for ; Mon, 17 Nov 2014 11:36:24 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAHBaOhr031134 for ; Mon, 17 Nov 2014 11:36:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 191573] [zfs] kernel panic when running zpool/add/files.t Date: Mon, 17 Nov 2014 11:36:24 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: commit-hook@freebsd.org X-Bugzilla-Status: In Discussion X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 11:36:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191573 --- Comment #19 from commit-hook@freebsd.org --- A commit references this bug: Author: smh Date: Mon Nov 17 11:35:30 UTC 2014 New revision: 274620 URL: https://svnweb.freebsd.org/changeset/base/274620 Log: Revert r273630 as the panic was fixed by r274619 The panic was caused by TRIM requests run against file based vdevs as write requests. PR: 191573 Sponsored by: Multiplay Changes: head/tools/regression/zfs/zpool/add/files.t -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 11:38:23 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3FFA9934 for ; Mon, 17 Nov 2014 11:38:23 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 280588AE for ; Mon, 17 Nov 2014 11:38:23 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAHBcNwe032653 for ; Mon, 17 Nov 2014 11:38:23 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 191573] [zfs] kernel panic when running zpool/add/files.t Date: Mon, 17 Nov 2014 11:38:23 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: smh@FreeBSD.org X-Bugzilla-Status: In Discussion X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: smh@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 11:38:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191573 Steven Hartland changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|freebsd-fs@FreeBSD.org |smh@FreeBSD.org -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 12:05:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 92232C4 for ; Mon, 17 Nov 2014 12:05:59 +0000 (UTC) Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com [195.16.150.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 52325B67 for ; Mon, 17 Nov 2014 12:05:58 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop03.sare.net (Postfix) with ESMTPSA id 0C14E9DCAC9 for ; Mon, 17 Nov 2014 12:58:51 +0100 (CET) From: Borja Marcos Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: BIOS booting from disks > 2TB Date: Mon, 17 Nov 2014 12:58:48 +0100 Message-Id: <9A929629-2EA9-47FB-A8A8-1874BB0283A5@sarenet.es> To: "freebsd-fs@FreeBSD.org Filesystems" Mime-Version: 1.0 (Apple Message framework v1283) X-Mailer: Apple Mail (2.1283) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 12:05:59 -0000 Hi I've been trying to install FreeBSD on a 3TB disk attached to a Proliant = Microserver G8. The system is unable to boot, complaining about an error with the GPT backup table. = I don't have the machine here now, but I can reproduce this evening if someone really needs to see the = actual messages. The machine doesn't have UEFI, it can just boot via BIOS, and I = understand that the reading of the backup table is failing because the BIOS can just use 32 bits to specify a sector = number. Would it be possible to "fix" the boot process so that, if the following = conditions apply, that check is omitted? - Boot using BIOS, not UEFI - bits(disk size) > 32 What would be the side effects? Of course we can't assume it will be = safe to boot from a ZFS pool with devices larger than 2TB, as there is no guarantee that all the blocks it = needs to read are within the 2 TB limit.=20 But it should be safe to, at least, boot from a UFS partition contained = below the 2 TB limit. Any thoughts? Borja. From owner-freebsd-fs@FreeBSD.ORG Mon Nov 17 12:57:27 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E5B36A8 for ; Mon, 17 Nov 2014 12:57:27 +0000 (UTC) Received: from mail1.postbank.bg (mx.postbank.bg [195.242.126.253]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.postbank.bg", Issuer "GeoTrust DV SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2D2F1A0 for ; Mon, 17 Nov 2014 12:57:25 +0000 (UTC) X-AuditID: ac100165-f791c6d000000c67-d2-5469ed23a2ea Received: from sofdc01excv16.postbank.bg ( [10.1.129.39]) (using TLS with cipher AES128-SHA (128/128 bits)) (Client did not present a certificate) by mail1.postbank.bg (Eurobank AD BG Outbound mail system) with SMTP id BC.FA.03175.32DE9645; Mon, 17 Nov 2014 14:42:11 +0200 (EET) From: "Ivailo A. Tanusheff" To: "freebsd-fs@freebsd.org" Subject: ZFS and glabel Thread-Topic: ZFS and glabel Thread-Index: AdACY7LrZviKSHcMQ1qsE5/tkf/bpg== Date: Mon, 17 Nov 2014 12:42:09 +0000 Message-ID: <1422065A4E115F409E22C1EC9EDAFBA4220D0DB7@sofdc01exc02.postbank.bg> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.1.2.26] Content-Type: text/plain; charset="us-ascii" content-transfer-encoding: quoted-printable MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrLKsWRmVeSWpSXmKPExsXCxdiorqv8NjPE4PA8WYtjj3+yOTB6zPg0 nyWAMaqB0SYxLy+/JLEkVSEltTjZVik6IL+4JCkxLztWwSWzODknMTM3tUhJITPFVslYSaEg JzE5NTc1r8RWKbGgIDUvRcmOSwED2ACVZeYppOYl56dk5qXbKnkG++taWJha6hoq2SFMtVJT NjRO+M+eMetUD2vBT5GKtxuuszQwHuDvYuTgkBAwkXi81LmLkRPIFJO4cG89G4gtJDCHSWLG FCkQmw2oZNvcPUwgtoiAqcSvfwvAaoQFxCUuzrrABhGXkXi1dzYrhK0n0fBzFVg9i4CqxNL1 s8DivAL+El82PgSrZwTa9f3UGrAaZqA5t57MZ4K4QUBiyZ7zzBC2qMTLx/9YIWxZiUffHkPV 60gs2P2JDcLWlli28DUzxHxBiZMzn7BMYBSahWTsLCQts5C0zELSsoCRZRWjZHF+WkqygWGw r3uZgaFeATSC9JLSNzEC43eNAGPqDsYXV5wOMQpwMCrx8O7IzgwRYk0sK67MPcQowcGsJMIb cxEoxJuSWFmVWpQfX1Sak1p8iHE/IzAYJjJLiSbnA5NLXkm8oYmBiYmZpamphbGZBfWFDU1N jAwtDMwsTEkTVhLntV4EdL9AOjCZZqemFqQWwbzAxMEp1cAYv73sosilg57hLxtn8iya/7Aw p1lk1sc8nVenu/3mCdZ/vzOTvTM3//za1t87r+gqFvcy2W19/aV+s8ZK7/5SdYVf02NVk/4+ tVjMWnszWFhdvL07Zd287CbPENW9Hm438jdMnvxwQfz3rsDlOaGLWNxupTzceaNV4JtJ74+q 2siNjHY2saxKLMUZiYZazEXFiQDdQF64gAMAAA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2014 12:57:28 -0000 Dear all, I run to an interesting issue and I would like to discuss it with all of you= . The whole thing began with me trying to identify available HDD to include in= a zfs pool through a script/program. I assumed that the easiest way of doing this is using glabel. For example: root@FreeBSD:~ # glabel status Name Status Components gptid/248e758c-e267-11e3-95bb-08002796202b N/A ada0p1 diskid/DISK-VBdd471206-91164057 N/A ada5 diskid/DISK-VBe98b5e75-0d8cf6dc N/A ada8 diskid/DISK-VB7d006584-01beca12 N/A ada6 diskid/DISK-VB721029c3-66a60156 N/A ada7 diskid/DISK-VB31481dbb-639540a1 N/A ada2 diskid/DISK-VB95921208-4eb19f41 N/A ada4 So far it is OK and if I create pool like zpool create xxx ada4 then the lin= e for ada4 will disappear from the glabel status. As far as I remember though it is not recommended to use production pools ba= sed on the device naming, so I wanted to switch to gpt lable, i.e. diskid/D= ISK-VB95921208-4eb19f41. When I recreate pool like: zpool create xxx diskid/DISK-VB95921208-4eb19f41 the pool is cr= eated without problems, but the device does not disappear from the glabel st= atus list, thus making my program running wrong. Is this a problem with the zfs implementation, my server or the general idea= is wrong? BTW, if I label the disk additionally, like: glabel create VB95921208-4eb19f41 ada4 zpool create xxx label/VB95921208-4eb19f41 The glabel status again shows the right information. The problem with the la= test approach is that if someone executes: glabel destroy -f VB95921208-4eb19f41 The result becomes: pool: xxx state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'= . see: http://illumos.org/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM xxx UNAVAIL 0 0 0 6968348230421469155 REMOVED 0 0 0 was /dev/label/VB= 95921208-4eb19f41 And the data is practically unrecoverable. So my questions are: - Is there a way to make glabel to show the right data when I use diskid/DIS= K-VB95921208-4eb19f41 - Which is the most proper way of creating vdevs - with disk name (ada4), di= skid (diskid/DISK-VB95921208-4eb19f41) or manual labeling? - How may I found which disks are free, if the diskid approach is the best s= olution? Regards, Ivailo Tanusheff Disclaimer: This communication is confidential. If you are not the intended recipient, y= ou are hereby notified that any disclosure, copying, distribution or taking= any action in reliance on the contents of this information is strictly proh= ibited and may be unlawful. If you have received this communication by mista= ke, please notify us immediately by responding to this email and then delete= it from your system. Eurobank Bulgaria AD is not responsible for, nor endorses, any opinion, reco= mmendation, conclusion, solicitation, offer or agreement or any information= contained in this communication. Eurobank Bulgaria AD cannot accept any responsibility for the accuracy or co= mpleteness of this message as it has been transmitted over a public network.= If you suspect that the message may have been intercepted or amended, pleas= e call the sender. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 18 02:16:23 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DF0378FF for ; Tue, 18 Nov 2014 02:16:23 +0000 (UTC) Received: from homiemail-a4.g.dreamhost.com (homie.mail.dreamhost.com [208.97.132.208]) by mx1.freebsd.org (Postfix) with ESMTP id BE3EEA6E for ; Tue, 18 Nov 2014 02:16:23 +0000 (UTC) Received: from homiemail-a4.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a4.g.dreamhost.com (Postfix) with ESMTP id D505F51C063; Mon, 17 Nov 2014 18:16:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=dylanleigh.net; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to:content-transfer-encoding; s= dylanleigh.net; bh=x5AT4/HCKxBnCbnYM7SjVBgs+nk=; b=uWQ0D0iLx6mXO nVUGG8bnbR2Kxt4HcIS/6uM29ZZaDvnmPiPEU8NbL7zZIQ9V9hC4Xk/aYOOEbmYI dPiMfjvfXGmskvu5Zp8MTYPZ9wkZ9h4Gw6ejRoEkulJT/sok5LBJOxUcjMSsGofz XW9JHOZL6bPrJ/xRSB5/5mHZ1XVYzg= Received: from exhan.dylanleigh.net (ppp118-209-100-198.lns20.mel4.internode.on.net [118.209.100.198]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: dleigh@htns.net) by homiemail-a4.g.dreamhost.com (Postfix) with ESMTPSA id 40D5051C062; Mon, 17 Nov 2014 18:16:15 -0800 (PST) Date: Tue, 18 Nov 2014 13:15:58 +1100 From: Dylan Leigh To: Olivier =?iso-8859-1?Q?Cochard-Labb=E9?= Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade Message-ID: <20141118021558.GA30031@exhan.dylanleigh.net> References: <20141116080128.GA20042@exhan.dylanleigh.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: X-Author-WWW: http://www.dylanleigh.net User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Dylan Leigh X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2014 02:16:24 -0000 On Sun, Nov 16, 2014 at 04:10:28PM +0100, Olivier Cochard-Labb=E9 wrote: > On Sun, Nov 16, 2014 at 9:01 AM, Dylan Leigh wrot= e: >=20 > > Could you provide some other details about the pool structure/config, > > including the output of "zpool status"? > > > It's a raidz1 pool build with 5 SATA 2TB drives, and there are 5 zvolum= es > without advanced features (no compression, no snapshot, no de-dup, etc.= ..). > Because it's a raidz1 pool, I know that FREE space reported by a "zpool > list" include redundancy overhead and is bigger than AVAIL space report= ed > by a "zfs list". >=20 > I've moved about 100GB (on hundred GigaByte) of files and after this st= ep > there were only 2GB (two GigaByte) of Free space only: How is it possib= le ? The RAIDZ bit might be significant, if it isn't calculating the overhead properly on old pools for some reason. What were the zpool and FreeBSD versions before/after the upgrade? The output of "zpool history -i storage"/"zdb -h storage" might shed light on any changes. Also try "zdb -bb" - this will go through all block pointers and verify there are no leaks; unfortunately it might take as long as a scrub. :( -- Dylan --=20 Dylan Leigh // VU# s4081906 // www.dylanleigh.net From owner-freebsd-fs@FreeBSD.ORG Tue Nov 18 05:44:51 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9F84243E for ; Tue, 18 Nov 2014 05:44:51 +0000 (UTC) Received: from mail-pa0-x22b.google.com (mail-pa0-x22b.google.com [IPv6:2607:f8b0:400e:c03::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 69951FD1 for ; Tue, 18 Nov 2014 05:44:51 +0000 (UTC) Received: by mail-pa0-f43.google.com with SMTP id kx10so3584888pab.16 for ; Mon, 17 Nov 2014 21:44:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=02mHDdS7+pGKCuTxZ8qwhWPw+CNkYUagmz3IBKOFkdc=; b=nlBsSGZaxA0HAsOEnPAn3zVVQhyv60GsAh/jE9Uenn4ZJANh/7aHVFvJ6sUad8ejv6 KCxcX+HBXKeJyWJZwIjTZo1FB/wFZRydh1iVNESNNgP+iULhElh+8hm12bmdcrO2XXrv COpNJy6oVUa7mrpsB96oePtwcSjTLmphAFDQMraNnEoxr1OoQ4IFxA5o2ElohxqgDtT5 IA0+9CPZqd4FO3/4HK0miCRNgTyNtDSyUMoayESxRF0NzsQn9ePD0vo5zeEl+8N7TMPY nulXBl/xKYd+Q7o+iVEajRJXWoT/iS0+sIhp4t3kBxbugveAHXcFVyB/1myhUYs8DWOr 76Eg== X-Received: by 10.70.44.208 with SMTP id g16mr9335441pdm.130.1416289490993; Mon, 17 Nov 2014 21:44:50 -0800 (PST) Received: from core.summit (2001-44b8-31c7-e401-4a5b-39ff-fe76-f665.static.ipv6.internode.on.net. [2001:44b8:31c7:e401:4a5b:39ff:fe76:f665]) by mx.google.com with ESMTPSA id pc10sm36679038pbb.21.2014.11.17.21.44.48 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Nov 2014 21:44:50 -0800 (PST) Date: Tue, 18 Nov 2014 16:44:43 +1100 From: Emil Mikulic To: freebsd-fs@freebsd.org Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade Message-ID: <20141118054443.GA40514@core.summit> References: <20141116080128.GA20042@exhan.dylanleigh.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: fbsd@dylanleigh.net X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2014 05:44:51 -0000 On Sun, Nov 16, 2014 at 04:10:28PM +0100, Olivier Cochard-Labb? wrote: > On Sun, Nov 16, 2014 at 9:01 AM, Dylan Leigh wrote: > > > > > Could you provide some other details about the pool structure/config, > > including the output of "zpool status"? > > > > > It's a raidz1 pool build with 5 SATA 2TB drives, and there are 5 zvolumes > without advanced features (no compression, no snapshot, no de-dup, etc...). > Because it's a raidz1 pool, I know that FREE space reported by a "zpool > list" include redundancy overhead and is bigger than AVAIL space reported > by a "zfs list". > > I've moved about 100GB (on hundred GigaByte) of files and after this step > there were only 2GB (two GigaByte) of Free space only: How is it possible ? I had the same problem. Very old pool: History for 'jupiter': 2010-01-20.20:46:00 zpool create jupiter raidz /dev/ad10 /dev/ad12 /dev/ad14 I upgraded FreeBSD 8.3 to 9.0, which I think went fine, but when I upgraded to 10.1, I had 0B AVAIL according to "zfs list" and df(1), even though there was free space according to "zpool list" # zpool list -p jupiter NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT jupiter 4466765987840 4330587288576 136178699264 30% - 96 1.00x ONLINE - # zfs list -p jupiter NAME USED AVAIL REFER MOUNTPOINT jupiter 2884237136220 0 46376 /jupiter Deleting files, snapshots, and child filesystems didn't help, AVAIL stayed at zero bytes... until I deleted enough: NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT jupiter 4466765987840 4320649953280 146116034560 30% - 96 1.00x ONLINE - NAME USED AVAIL REFER MOUNTPOINT jupiter 2877618732010 4350460950 46376 /jupiter Apparently, the above happened somewhere between 96.0% and 96.9% used. Any ideas what happened here? It's almost like 100+GB of free space is somehow reserved by the system (and I don't mean "zfs set reservation", those are all "none") From owner-freebsd-fs@FreeBSD.ORG Tue Nov 18 17:49:00 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0B093765 for ; Tue, 18 Nov 2014 17:49:00 +0000 (UTC) Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id BCABEB67 for ; Tue, 18 Nov 2014 17:48:55 +0000 (UTC) Received: by platinum.linux.pl (Postfix, from userid 87) id 814D345218C; Tue, 18 Nov 2014 18:33:41 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on platinum.linux.pl X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=ALL_TRUSTED,AWL, TO_NO_BRKTS_PCNT autolearn=disabled version=3.4.0 Received: from [10.255.0.2] (c38-073.client.duna.pl [83.151.38.73]) by platinum.linux.pl (Postfix) with ESMTPA id 23F13452086 for ; Tue, 18 Nov 2014 18:33:41 +0100 (CET) Message-ID: <546B8203.5040607@platinum.linux.pl> Date: Tue, 18 Nov 2014 18:29:39 +0100 From: Adam Nowacki User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade References: <20141116080128.GA20042@exhan.dylanleigh.net> <20141118054443.GA40514@core.summit> In-Reply-To: <20141118054443.GA40514@core.summit> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2014 17:49:00 -0000 On 2014-11-18 06:44, Emil Mikulic wrote: > On Sun, Nov 16, 2014 at 04:10:28PM +0100, Olivier Cochard-Labb? wrote: >> On Sun, Nov 16, 2014 at 9:01 AM, Dylan Leigh wrote: >> >>> >>> Could you provide some other details about the pool structure/config, >>> including the output of "zpool status"? >>> >>> >> It's a raidz1 pool build with 5 SATA 2TB drives, and there are 5 zvolumes >> without advanced features (no compression, no snapshot, no de-dup, etc...). >> Because it's a raidz1 pool, I know that FREE space reported by a "zpool >> list" include redundancy overhead and is bigger than AVAIL space reported >> by a "zfs list". >> >> I've moved about 100GB (on hundred GigaByte) of files and after this step >> there were only 2GB (two GigaByte) of Free space only: How is it possible ? > > I had the same problem. Very old pool: > > History for 'jupiter': > 2010-01-20.20:46:00 zpool create jupiter raidz /dev/ad10 /dev/ad12 /dev/ad14 > > I upgraded FreeBSD 8.3 to 9.0, which I think went fine, but when I upgraded > to 10.1, I had 0B AVAIL according to "zfs list" and df(1), even though there was > free space according to "zpool list" > > # zpool list -p jupiter > NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT > jupiter 4466765987840 4330587288576 136178699264 30% - 96 1.00x ONLINE - > > # zfs list -p jupiter > NAME USED AVAIL REFER MOUNTPOINT > jupiter 2884237136220 0 46376 /jupiter > > Deleting files, snapshots, and child filesystems didn't help, AVAIL stayed at > zero bytes... until I deleted enough: > > NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT > jupiter 4466765987840 4320649953280 146116034560 30% - 96 1.00x ONLINE - > > NAME USED AVAIL REFER MOUNTPOINT > jupiter 2877618732010 4350460950 46376 /jupiter > > Apparently, the above happened somewhere between 96.0% and 96.9% used. > > Any ideas what happened here? It's almost like 100+GB of free space is somehow > reserved by the system (and I don't mean "zfs set reservation", those are all > "none") This commit is to blame: http://svnweb.freebsd.org/base?view=revision&revision=268455 3.125% of disk space is reserved. From owner-freebsd-fs@FreeBSD.ORG Tue Nov 18 19:00:39 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 57874987 for ; Tue, 18 Nov 2014 19:00:39 +0000 (UTC) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 37423335 for ; Tue, 18 Nov 2014 19:00:39 +0000 (UTC) Received: from zeta.ixsystems.com (unknown [12.229.62.2]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 7631820357; Tue, 18 Nov 2014 11:00:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1416337238; x=1416351638; bh=bVXijf5ZAK1ol69Qo+VV4btz0myLoIpPxnWR8AVK3jA=; h=Date:From:Reply-To:To:Subject:References:In-Reply-To; b=Q28FNDv0j8uE52KNimlsZHTJulBqbySjHkspVsQa5NxmU7g1xs8v/IyUVmr4Fl5bF YID4/ur+RU9JXYwVs6AdixywjdtyBeWJbv4dJ5MAQDLMaZAh4xAcWKondu9xLn3D0g jm01kvSTyAm2gcAS8GhvTLUN/Lm3HqT82xG4MTDM= Message-ID: <546B9754.4060906@delphij.net> Date: Tue, 18 Nov 2014 11:00:36 -0800 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: Adam Nowacki , freebsd-fs@freebsd.org Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade References: <20141116080128.GA20042@exhan.dylanleigh.net> <20141118054443.GA40514@core.summit> <546B8203.5040607@platinum.linux.pl> In-Reply-To: <546B8203.5040607@platinum.linux.pl> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2014 19:00:39 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 11/18/14 09:29, Adam Nowacki wrote: > On 2014-11-18 06:44, Emil Mikulic wrote: >> On Sun, Nov 16, 2014 at 04:10:28PM +0100, Olivier Cochard-Labb? >> wrote: >>> On Sun, Nov 16, 2014 at 9:01 AM, Dylan Leigh >> dylanleigh.net> wrote: >>> >>>> >>>> Could you provide some other details about the pool >>>> structure/config, including the output of "zpool status"? >>>> >>>> >>> It's a raidz1 pool build with 5 SATA 2TB drives, and there are >>> 5 zvolumes without advanced features (no compression, no >>> snapshot, no de-dup, etc...). Because it's a raidz1 pool, I >>> know that FREE space reported by a "zpool list" include >>> redundancy overhead and is bigger than AVAIL space reported by >>> a "zfs list". >>> >>> I've moved about 100GB (on hundred GigaByte) of files and after >>> this step there were only 2GB (two GigaByte) of Free space >>> only: How is it possible ? >> >> I had the same problem. Very old pool: >> >> History for 'jupiter': 2010-01-20.20:46:00 zpool create jupiter >> raidz /dev/ad10 /dev/ad12 /dev/ad14 >> >> I upgraded FreeBSD 8.3 to 9.0, which I think went fine, but when >> I upgraded to 10.1, I had 0B AVAIL according to "zfs list" and >> df(1), even though there was free space according to "zpool >> list" >> >> # zpool list -p jupiter NAME SIZE ALLOC FREE FRAG >> EXPANDSZ CAP DEDUP HEALTH ALTROOT jupiter 4466765987840 >> 4330587288576 136178699264 30% - 96 1.00x >> ONLINE - >> >> # zfs list -p jupiter NAME USED >> AVAIL REFER MOUNTPOINT jupiter >> 2884237136220 0 46376 /jupiter >> >> Deleting files, snapshots, and child filesystems didn't help, >> AVAIL stayed at zero bytes... until I deleted enough: >> >> NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP >> HEALTH ALTROOT jupiter 4466765987840 4320649953280 >> 146116034560 30% - 96 1.00x ONLINE - >> >> NAME USED AVAIL REFER MOUNTPOINT jupiter >> 2877618732010 4350460950 46376 /jupiter >> >> Apparently, the above happened somewhere between 96.0% and 96.9% >> used. >> >> Any ideas what happened here? It's almost like 100+GB of free >> space is somehow reserved by the system (and I don't mean "zfs >> set reservation", those are all "none") > > This commit is to blame: > http://svnweb.freebsd.org/base?view=revision&revision=268455 > > 3.125% of disk space is reserved. Note that the reserved space is so that one can always delete files, etc. to get the pool back to a usable state. I've added a new tunable/sysctl in r274674, but note that tuning is not recommended: by using too much space the pool would become read-only permanently and one will have to dump data and recreate the pool. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0 iQIcBAEBCgAGBQJUa5dUAAoJEJW2GBstM+nsyiUP/0KuNeTrocsQPrZ8YnsDGHyd QXFDdZ9B9RTD3GygUwLZIAX0st1pCy28sTfF1Ph54rfq2DkIJaJwUzlOeOTceNup hppXcYah5kX4YnnVek73+W6JZWxUV9MbSpOYn6bhyItyRtbv2dDpytJa6D6uKggq rkpoU1tyIQLZJPZ5m9pL7h3XvxZpHRJLSqD7JlYr9aXzqFDoXxq5vvD6tZkpkx7f sFhcSDEPb7wKbPA+UbQ+YvycMJyEqKDgdOWvqC1puSGPqRzN8WZcM8Qw/Rs9wpsl QiCK1OJQwO1RBIJUJq9SVyCE08lDDvMrG+3kEemCac8p066/15Vpxoqu818mskfS 0MA6CUQMAepjHoyntd6vokWGu6O9Lx92pRa11/RfQ5xql29hmOz3dXBtcIX7ApJQ Wxcvip+2yLaeDMw0bc0M1nxUpuQUPbf4Rob0li8T0S7g26Dll84FUBAWV/1F5hh2 +7i3Tt385opZgautZDEiGk0MZFLb+2EdqXxOWi479vJ3rS1Q2oGsgiPH8PQ4Lrog QmQ9KplmxIOyrIVdMUrq+ywHWY6nMA5buKsXTEaLKqghCM2mJK8n9LpwZoAkfG9y Ueko1GssfYhuJ++VQrOAFYcf7voxSrlj4XPdzofS+lrDQ7FluF8b+iFWjz/IEQug b9PAF4KyBdPfdWMqQgW9 =e2r4 -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 01:36:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 192F6815 for ; Wed, 19 Nov 2014 01:36:18 +0000 (UTC) Received: from mail-pd0-x231.google.com (mail-pd0-x231.google.com [IPv6:2607:f8b0:400e:c02::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D66FE1B2 for ; Wed, 19 Nov 2014 01:36:17 +0000 (UTC) Received: by mail-pd0-f177.google.com with SMTP id ft15so3566622pdb.22 for ; Tue, 18 Nov 2014 17:36:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=jU0yGwi1VlRTBW0YOpssLzWk4ee1KiPIdvF72UaVqo0=; b=o9esrpbdMRJY1Zb7aQiTOUXIr1qSPLVSmsP8E3qZjqICLoplEBPDzUbN2JxFEVApoa f1QIyxPJ8XY2Ds9+WL/ksdoS1bAVQl0MFIvNUtSRHdORfB9XgTM85PTwY+jIf81YWcR5 X/P+3bea6OyQKIAgtjEKFw63KJIlsikkx1l6sYqfUCP/ItA15p/Oia9holsggvLuCjyI MXB1cdmpXsT59nWb5jYIzsnWcsn36pBoG0D+ZHXrQLPKeBCpl7oZFOXskQIV7cCMP3sg DaMTnR8OVMLdduSgKusGAyD6YBnzH5IzcbGlweKILa4sO/6a5c9aKmLpmyKjChmsTM6A nCpQ== X-Received: by 10.67.6.1 with SMTP id cq1mr13001408pad.23.1416360977376; Tue, 18 Nov 2014 17:36:17 -0800 (PST) Received: from core.summit (2001-44b8-31c7-e401-4a5b-39ff-fe76-f665.static.ipv6.internode.on.net. [2001:44b8:31c7:e401:4a5b:39ff:fe76:f665]) by mx.google.com with ESMTPSA id s5sm106045pdc.52.2014.11.18.17.36.14 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2014 17:36:16 -0800 (PST) Date: Wed, 19 Nov 2014 12:36:11 +1100 From: Emil Mikulic To: d@delphij.net Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade Message-ID: <20141119013611.GA52102@core.summit> References: <20141116080128.GA20042@exhan.dylanleigh.net> <20141118054443.GA40514@core.summit> <546B8203.5040607@platinum.linux.pl> <546B9754.4060906@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <546B9754.4060906@delphij.net> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 01:36:18 -0000 On Tue, Nov 18, 2014 at 11:00:36AM -0800, Xin Li wrote: > On 11/18/14 09:29, Adam Nowacki wrote: > > This commit is to blame: > > http://svnweb.freebsd.org/base?view=revision&revision=268455 > > > > 3.125% of disk space is reserved. This is the sort of thing I suspected, but I didn't spot this commit. > Note that the reserved space is so that one can always delete files, > etc. to get the pool back to a usable state. What about the "truncate -s0" trick? That doesn't work reliably? > I've added a new tunable/sysctl in r274674, but note that tuning is > not recommended Thanks!! Can you give us an example of how (and when) to tune the sysctl? Regarding r268455, this is kind of a gotcha for people who are running their pools close to full - should this be mentioned in UPDATING or in the release notes? I understand that ZFS needs free space to be able to free more space, but 3% of a large pool is a lot of bytes. From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 02:34:45 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CD40B4AC for ; Wed, 19 Nov 2014 02:34:45 +0000 (UTC) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "anubis.delphij.net", Issuer "StartCom Class 1 Primary Intermediate Server CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id AC7A298F for ; Wed, 19 Nov 2014 02:34:45 +0000 (UTC) Received: from zeta.ixsystems.com (unknown [12.229.62.2]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by anubis.delphij.net (Postfix) with ESMTPSA id 657B822D99; Tue, 18 Nov 2014 18:34:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphij.net; s=anubis; t=1416364485; x=1416378885; bh=+W7SmbTaYoOcWF+PDHNWQGFlPbT6ocVXxQy8RWv/2hM=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=yjjyYOf2qJfUdTPzndjFT1NQYcBQ/JlPgpHkg6O58Bgay5B+pRYygJxwggR8dSO63 P7d7wldIi5Y1S3g7S3GsEYM2jOCtejrrCjj6jS1J8QunDdjYJlkD/gcLrDg6nr1LSA SApQPs6QuDzFoFXjZ89J2qhf+OZJdg1jnPcw14uE= Message-ID: <546C01C5.7080605@delphij.net> Date: Tue, 18 Nov 2014 18:34:45 -0800 From: Xin Li Reply-To: d@delphij.net Organization: The FreeBSD Project MIME-Version: 1.0 To: Emil Mikulic , d@delphij.net Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade References: <20141116080128.GA20042@exhan.dylanleigh.net> <20141118054443.GA40514@core.summit> <546B8203.5040607@platinum.linux.pl> <546B9754.4060906@delphij.net> <20141119013611.GA52102@core.summit> In-Reply-To: <20141119013611.GA52102@core.summit> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 02:34:46 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 11/18/14 17:36, Emil Mikulic wrote: > On Tue, Nov 18, 2014 at 11:00:36AM -0800, Xin Li wrote: >> On 11/18/14 09:29, Adam Nowacki wrote: >>> This commit is to blame: >>> http://svnweb.freebsd.org/base?view=revision&revision=268455 >>> >>> 3.125% of disk space is reserved. > > This is the sort of thing I suspected, but I didn't spot this > commit. > >> Note that the reserved space is so that one can always delete >> files, etc. to get the pool back to a usable state. > > What about the "truncate -s0" trick? That doesn't work reliably? > >> I've added a new tunable/sysctl in r274674, but note that tuning >> is not recommended > > Thanks!! > > Can you give us an example of how (and when) to tune the sysctl? sysctl vfs.zfs.spa_slop_shift=6 would tune down the reserved space to 1/(2^6) (=1.5625%). Personally I would never tune it. At this level of space your pool is already running at degraded performance, by the way. Don't do that. > Regarding r268455, this is kind of a gotcha for people who are > running their pools close to full - should this be mentioned in > UPDATING or in the release notes? > > I understand that ZFS needs free space to be able to free more > space, but 3% of a large pool is a lot of bytes. Well, if you look at UFS, the reservation ratio is about 7.5% (8/108). File systems need free space to do allocation efficiently; even with mostly static contents, performance would suffer because at high level of space usage the file system would spend more time on looking up for free space and the resulted allocation is likely to be more fragmented. For ZFS, this means many essential operations like resilvering would be much slower, which is a real threat to data recoverability. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0 iQIcBAEBCgAGBQJUbAHFAAoJEJW2GBstM+nsy1cP/2LSiMfdINJFVThhOm23AQjc 0nAs7Z5FtahAHGlM5hJh2b5eaAqeVNh/Kc3Bf3egDVI9AQo7ZIUG2yR2BufMQ77O QbH2U4/wZRWm1Gw3QXhDpX37OjfcLFeopJ0ls3316ds7zfX4MXoHr/Zah+3gmrC2 3f7d7drmwTuIKYI22hQOiE72SYpZcJz3dFDc8ayn3JSLv5EOPxum7nVgrS1EgOps 4luP87wA6aR1sC4tMumsIHXPdqQJdSxPvClyzwAHDQu36f42myWQoJosgyTdmujK PoT/0RMVRs8tZkPBGejZVjumhkNAHWNs9glLhzReGy12Vvk8EVoV/AbvkFWSMO9l WS+6pwNCRYt7UWbl/uPdLwyd2+UpAzI+A/IFdNJAuxDK2KeAxaynVsvvmmtrH7JP JUt79yyDQmJhPribVqzMfjYp6oFJJHyEuQcMnrog2S+x/mx1Lz908Qk1Ox2izU5F SF3yK19ol+khYM1yDuF5ikiWHI0DJXtGPpEKeF82tpUhfgb7LYW2hITrFzdO5HfA BjhpX4y8lWlCub4Ji/275gTUCuKp5TDzSUqFn38uc/iG0IaV4jESENP/ZDA/PD64 RXS0nHjwBzvVHEWnD7fkEI8pzOY3iXvZMrnRjG4lL/RmkmmNSN50Q2vNRMn1tw1K TJu/wf7vrPMlwt0eLx+u =0Xf8 -----END PGP SIGNATURE----- From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 04:12:49 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 129D8868 for ; Wed, 19 Nov 2014 04:12:49 +0000 (UTC) Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 99B555E8 for ; Wed, 19 Nov 2014 04:12:48 +0000 (UTC) Received: by mail-wi0-f169.google.com with SMTP id r20so7677515wiv.2 for ; Tue, 18 Nov 2014 20:12:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=e+GLIpQPR9bSFzCqRMXNUhWJ8bo/57uxd1wZgyhW9rU=; b=eCBdt0Z7N+nWy47CuR4rcWz8qGZalAjnOI+ee+Bppf+YbWGGXrhTOnSiRI8TjRURE0 EmLDZxo5U6ya8yqLIGxNFemLBBff/jHCfoRcNY9ELCtmcDvdYVAosT9kaLZMO6GfeyLj fTU0Et9zSO1UopF9V43jJJqkTYHzxdm837UibTokmio1SFaJl9w002wTErBjrVdVikf3 UtaqPrzBSwqSxGU9+nqMchRe5wsWrCC+gPor6UWqnb2sZnorL4K7MNQU3LL7xTI3fi9Q EMuDL00hTcst4zKYwYP5tmutbGRrHIx41UD7PgunS5unVK8Kgtd5aNEVNTMsq/x8j823 W9mQ== X-Gm-Message-State: ALoCoQn050G/7870no6gNMOd8vjQbpOZiNp3fQyOkyj9SFWGHcOCzPXotfdeGMsvjW8dMOo53QIi X-Received: by 10.180.11.168 with SMTP id r8mr1594197wib.74.1416370360926; Tue, 18 Nov 2014 20:12:40 -0800 (PST) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id mw7sm472629wib.14.2014.11.18.20.12.38 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Nov 2014 20:12:39 -0800 (PST) Message-ID: <546C18F4.1090209@multiplay.co.uk> Date: Wed, 19 Nov 2014 04:13:40 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade References: <20141116080128.GA20042@exhan.dylanleigh.net> <20141118054443.GA40514@core.summit> <546B8203.5040607@platinum.linux.pl> <546B9754.4060906@delphij.net> <20141119013611.GA52102@core.summit> <546C01C5.7080605@delphij.net> In-Reply-To: <546C01C5.7080605@delphij.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 04:12:49 -0000 On 19/11/2014 02:34, Xin Li wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 11/18/14 17:36, Emil Mikulic wrote: >> On Tue, Nov 18, 2014 at 11:00:36AM -0800, Xin Li wrote: >>> On 11/18/14 09:29, Adam Nowacki wrote: >>>> This commit is to blame: >>>> http://svnweb.freebsd.org/base?view=revision&revision=268455 >>>> >>>> 3.125% of disk space is reserved. >> This is the sort of thing I suspected, but I didn't spot this >> commit. >> >>> Note that the reserved space is so that one can always delete >>> files, etc. to get the pool back to a usable state. >> What about the "truncate -s0" trick? That doesn't work reliably? >> >>> I've added a new tunable/sysctl in r274674, but note that tuning >>> is not recommended >> Thanks!! >> >> Can you give us an example of how (and when) to tune the sysctl? > sysctl vfs.zfs.spa_slop_shift=6 would tune down the reserved space to > 1/(2^6) (=1.5625%). > > Personally I would never tune it. At this level of space your pool is > already running at degraded performance, by the way. Don't do that. > >> Regarding r268455, this is kind of a gotcha for people who are >> running their pools close to full - should this be mentioned in >> UPDATING or in the release notes? >> >> I understand that ZFS needs free space to be able to free more >> space, but 3% of a large pool is a lot of bytes. > Well, if you look at UFS, the reservation ratio is about 7.5% (8/108). > > File systems need free space to do allocation efficiently; even with > mostly static contents, performance would suffer because at high level > of space usage the file system would spend more time on looking up for > free space and the resulted allocation is likely to be more > fragmented. For ZFS, this means many essential operations like > resilvering would be much slower, which is a real threat to data > recoverability. > The new space map code should help with that and a fixed 3.125% is a large portion of a decent sized pool. On our event cache box for example thats 256GB which feels like silly amount to reserve. Does anyone have any stats which backup the need for this amount of free space on large pool arrays, specifically with spacemaps enabled? From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 07:06:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B25CED41 for ; Wed, 19 Nov 2014 07:06:59 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0088.outbound.protection.outlook.com [157.56.110.88]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 53D17844 for ; Wed, 19 Nov 2014 07:06:58 +0000 (UTC) Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by DM2PR0801MB0942.namprd08.prod.outlook.com (25.160.131.25) with Microsoft SMTP Server (TLS) id 15.1.16.15; Wed, 19 Nov 2014 07:06:49 +0000 Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with mapi id 15.01.0016.006; Wed, 19 Nov 2014 07:06:49 +0000 From: "Pokala, Ravi" To: "borjam@sarenet.es" , "freebsd-fs@freebsd.org" Subject: Re: BIOS booting from disks > 2TB Thread-Topic: BIOS booting from disks > 2TB Thread-Index: AQHQA8djh8Fld4MMbk2TOr3Zp8LvCg== Date: Wed, 19 Nov 2014 07:06:49 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.6.141106 x-originating-ip: [24.6.178.251] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0942; x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:; SRVR:DM2PR0801MB0942; x-forefront-prvs: 04004D94E2 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(199003)(189002)(122556002)(83506001)(36756003)(40100003)(97736003)(2656002)(92566001)(86362001)(66066001)(92726001)(4396001)(87936001)(64706001)(106356001)(20776003)(2501002)(21056001)(106116001)(105586002)(77096003)(77156002)(62966003)(95666004)(99396003)(99286002)(46102003)(120916001)(50986999)(54356999)(107886001)(101416001)(31966008)(107046002); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0942; H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: panasas.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 07:06:59 -0000 Hi Borja, I'd like to clarify something important - you don't need to boot using UEFI to use GPT for partitioning. You *do* need to use GPT to have partitions larger than 2^32 sectors (2TiB for drives w/ 512B sectors, or 16TiB for drives w/ 4KB sectors). When you perform your installation, just make sure to select the GPT option for partitioning. The installer (either `bsdinstall' (for stock FreeBSD), or `pc-sysinstall' (for PC-BSD / FreeNAS)) should create both primary (near start-of-disk) and backup (at end-of-disk) GPT tables, and install the appropriate bootstrap code in the proper locations. Hope that helps, Ravi -------- Details, for those who are interested: The firmware reads LBA 0 to find the Master Boot Record (MBR). In both cases: - the first 446 bytes are executable code - the next 64 bytes are the slice table - the last 2 bytes are the MBR signature ---- In the slice/label partitioning scheme (some of the details might be slightly off, but the big picture is right): - the 446-byte executable code [mbr.s] reads the slice table, finds the first-stage bootloader [boot0.S], and executes it - the first-stage bootloader finds the second-stage bootloader [boot1.S], and executes it - the second-stage bootloader reads the disklabel, finds the third-stage bootloader [boot2.c], and executes it - the third-stage bootloader reads the UFS filesystem, finds the `loader' binary, and executes it ---- In the case of GPT, the slice-table portion of the MBR contains a single slice entry which covers the whole disk. This protects a non-GPT-aware slicer from trying to add slices. For this reason, it is referred to as the Protective MBR (PMBR). In the GPT scheme (I'm more confident about this): - the 446-byte executable code [pmbr.s] finds the GPT Header - based on the GPT Header, the PMBR finds the GPT Table - based on the GPT Table, the PMBR finds the GPT bootstrap [gptldr.S, gptboot.c], and executes it - the GPT bootstrap reads the UFS filesystem, finds the `loader' binary, and executes it Note that, by not having to find and parse a separate disklabel in addition to a partition table, the GPT bootstrap actually has fewer stages. -------- Source, for those who are *really* interested: [mbr.s] - sys/boot/i386/mbr [boot0.S] - sys/boot/i386/boot0 [boot1.S] - sys/boot/i386/boot2 # Yes, boot1 is in the boot2 directory [boot2.c] - sys/boot/i386/boot2 [pmbr.s] - sys/boot/i386/pmbr [gptldr.S, gptboot.c] - sys/boot/i386/gptboot --rpokala From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 07:20:34 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D0F8F3CB for ; Wed, 19 Nov 2014 07:20:34 +0000 (UTC) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9019C97A for ; Wed, 19 Nov 2014 07:20:34 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id 5B0C59DF8AC; Wed, 19 Nov 2014 08:13:29 +0100 (CET) Subject: Re: BIOS booting from disks > 2TB Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: Date: Wed, 19 Nov 2014 08:13:25 +0100 Content-Transfer-Encoding: 7bit Message-Id: References: To: "Pokala, Ravi" X-Mailer: Apple Mail (2.1283) Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 07:20:34 -0000 On Nov 19, 2014, at 8:06 AM, Pokala, Ravi wrote: > Hi Borja, > > I'd like to clarify something important - you don't need to boot using > UEFI to use GPT for partitioning. You *do* need to use GPT to have > partitions larger than 2^32 sectors (2TiB for drives w/ 512B sectors, or > 16TiB for drives w/ 4KB sectors). I know :) Something went wrong, though. I will repeat paying extra attention to all the steps and report back. Thanks! From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 09:32:25 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D8BF7E50 for ; Wed, 19 Nov 2014 09:32:25 +0000 (UTC) Received: from lucifer.we.lc.ehu.es (lucifer.we.lc.ehu.es [158.227.6.50]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "lucifer.we.lc.ehu.es", Issuer "CA Dpto Electricidad y Electronica" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6530AA09 for ; Wed, 19 Nov 2014 09:32:24 +0000 (UTC) Received: from ncc-1701.we.lc.ehu.es (ncc-1701.we.lc.ehu.es [158.227.6.85]) (authenticated bits=0) by lucifer.we.lc.ehu.es (8.13.1/8.13.1) with ESMTP id sAJ96hGR092990 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 19 Nov 2014 10:06:43 +0100 (CET) (envelope-from jose@we.lc.ehu.es) From: =?utf-8?Q?Jos=C3=A9_Mar=C3=ADa_Alcaide?= Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Re: BIOS booting from disks > 2TB Message-Id: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> Date: Wed, 19 Nov 2014 10:06:43 +0100 To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) X-Mailer: Apple Mail (2.1993) X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (lucifer.we.lc.ehu.es [158.227.6.50]); Wed, 19 Nov 2014 10:06:43 +0100 (CET) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 09:32:25 -0000 On Nov 19, 2014, at 8:06 AM, Pokala, Ravi wrote: > When you perform your installation, just make sure to select the GPT > option for partitioning. The installer (either `bsdinstall' (for stock > FreeBSD), or `pc-sysinstall' (for PC-BSD / FreeNAS)) should create = both > primary (near start-of-disk) and backup (at end-of-disk) GPT tables, = and > install the appropriate bootstrap code in the proper locations. >=20 Yes, bsdinstall flawlessly creates both primary and backup GPT tables = even using disks > 2 TB, by virtue of the FreeBSD kernel. The problem = arises at the first stages of booting, when gptboot tries to compare the = primary and backup tables *using the BIOS disk services*, which are not = able to reach anything after the 2 TB limit. As a consequence gptboot = fails, stating that it did not find the GPT backup table. -- Jose M. Alcaide From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 11:09:36 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D3C098F3 for ; Wed, 19 Nov 2014 11:09:36 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2D703653 for ; Wed, 19 Nov 2014 11:09:35 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA21858; Wed, 19 Nov 2014 13:11:19 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Xr38Y-0004dO-Hr; Wed, 19 Nov 2014 13:09:26 +0200 Message-ID: <546C7A29.9020103@FreeBSD.org> Date: Wed, 19 Nov 2014 13:08:25 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Steven Hartland , freebsd-fs@FreeBSD.org Subject: Re: No more free space after upgrading to 10.1 and zpool upgrade References: <20141116080128.GA20042@exhan.dylanleigh.net> <20141118054443.GA40514@core.summit> <546B8203.5040607@platinum.linux.pl> <546B9754.4060906@delphij.net> <20141119013611.GA52102@core.summit> <546C01C5.7080605@delphij.net> <546C18F4.1090209@multiplay.co.uk> In-Reply-To: <546C18F4.1090209@multiplay.co.uk> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 11:09:36 -0000 On 19/11/2014 06:13, Steven Hartland wrote: > The new space map code should help with that and a fixed 3.125% is a large > portion of a decent sized pool. > > On our event cache box for example thats 256GB which feels like silly amount to > reserve. > > Does anyone have any stats which backup the need for this amount of free space > on large pool arrays, specifically with spacemaps enabled? There was a presentation about the spacemap code at the OpenZFS Devsummit. One point is that going above 95% will kill performance (actually sooner). This is applicable to all spacemap algorithms currently available. There is a pretty graph at page 13 here http://open-zfs.org/w/images/3/31/Performance-George_Wilson.pdf -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 15:43:29 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F1D4AE38 for ; Wed, 19 Nov 2014 15:43:28 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bon0078.outbound.protection.outlook.com [157.56.111.78]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 87D1294B for ; Wed, 19 Nov 2014 15:43:27 +0000 (UTC) Received: from DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) by DM2PR0801MB0944.namprd08.prod.outlook.com (25.160.131.27) with Microsoft SMTP Server (TLS) id 15.1.16.15; Wed, 19 Nov 2014 15:43:19 +0000 Received: from DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) by DM2PR0801MB0944.namprd08.prod.outlook.com ([25.160.131.27]) with mapi id 15.01.0016.006; Wed, 19 Nov 2014 15:43:19 +0000 From: "Pokala, Ravi" To: "jose@we.lc.ehu.es" , Borja Marcos Subject: Re: BIOS booting from disks > 2TB Thread-Topic: BIOS booting from disks > 2TB Thread-Index: AQHQBA+Kh8Fld4MMbk2TOr3Zp8LvCg== Date: Wed, 19 Nov 2014 15:43:18 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.6.141106 x-originating-ip: [24.6.178.251] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR0801MB0944; x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:; SRVR:DM2PR0801MB0944; x-forefront-prvs: 04004D94E2 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(199003)(24454002)(189002)(377454003)(51704005)(62966003)(97736003)(31966008)(106356001)(99286002)(101416001)(77156002)(4396001)(77096003)(66066001)(107046002)(46102003)(36756003)(2656002)(86362001)(19580395003)(54356999)(92726001)(122556002)(20776003)(99396003)(120916001)(64706001)(21056001)(19580405001)(83506001)(2501002)(105586002)(50986999)(95666004)(106116001)(87936001)(40100003)(92566001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM2PR0801MB0944; H:DM2PR0801MB0944.namprd08.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Content-Type: text/plain; charset="iso-8859-1" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: panasas.com Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 15:43:29 -0000 >Date: Wed, 19 Nov 2014 10:06:43 +0100 >From: Jos? Mar?a Alcaide >To: freebsd-fs@freebsd.org >Subject: Re: BIOS booting from disks > 2TB >Message-ID: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> >Content-Type: text/plain; charset=3Dus-ascii > >On Nov 19, 2014, at 8:06 AM, Pokala, Ravi wrote: > >> When you perform your installation, just make sure to select the GPT >> option for partitioning. The installer (either `bsdinstall' (for stock >> FreeBSD), or `pc-sysinstall' (for PC-BSD / FreeNAS)) should create both >> primary (near start-of-disk) and backup (at end-of-disk) GPT tables, and >> install the appropriate bootstrap code in the proper locations. >>=20 > >Yes, bsdinstall flawlessly creates both primary and backup GPT tables >even using disks > 2 TB, by virtue of the FreeBSD kernel. The problem >arises at the first stages of booting, when gptboot tries to compare the >primary and backup tables *using the BIOS disk services*, which are not >able to reach anything after the 2 TB limit. As a consequence gptboot >fails, stating that it did not find the GPT backup table. Jos=E9: Ah, I see what you're saying. That sounds reasonable. I never saw those warnings, because the version of the PMBR that I'm using at work is fairly old; it pre-dates the code to check the backup GPT if the primary is invalid [r239060]. The fact that this message is coming up at all means the primary GPT is broken. :-( Borja: I'd try booting from a different device (network, USB), then see if `gpart show' is able to list the partitions on the drive in question. If it is, then the secondary GPT is okay, and you may be able to use `gpart backup' to save out the parsed partition table. You could then use `gpart restore' to re-write the partition table to both primary and backup locations. I say that having never actually *done* it, so proceed with caution, and let us know what happens. Good luck! -Ravi >-- >Jose M. Alcaide From owner-freebsd-fs@FreeBSD.ORG Wed Nov 19 17:29:28 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 94BCBB41 for ; Wed, 19 Nov 2014 17:29:28 +0000 (UTC) Received: from wonkity.com (wonkity.com [67.158.26.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "wonkity.com", Issuer "wonkity.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 42233807 for ; Wed, 19 Nov 2014 17:29:28 +0000 (UTC) Received: from wonkity.com (localhost [127.0.0.1]) by wonkity.com (8.14.9/8.14.9) with ESMTP id sAJHTQuA075307 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 19 Nov 2014 10:29:26 -0700 (MST) (envelope-from wblock@wonkity.com) Received: from localhost (wblock@localhost) by wonkity.com (8.14.9/8.14.9/Submit) with ESMTP id sAJHTQbs075294; Wed, 19 Nov 2014 10:29:26 -0700 (MST) (envelope-from wblock@wonkity.com) Date: Wed, 19 Nov 2014 10:29:26 -0700 (MST) From: Warren Block To: =?ISO-8859-15?Q?Jos=E9_Mar=EDa_Alcaide?= Subject: Re: BIOS booting from disks > 2TB In-Reply-To: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> Message-ID: References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> User-Agent: Alpine 2.11 (BSF 23 2013-08-11) MIME-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (wonkity.com [127.0.0.1]); Wed, 19 Nov 2014 10:29:26 -0700 (MST) Content-Type: TEXT/PLAIN; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2014 17:29:28 -0000 On Wed, 19 Nov 2014, José María Alcaide wrote: > On Nov 19, 2014, at 8:06 AM, Pokala, Ravi wrote: > >> When you perform your installation, just make sure to select the GPT >> option for partitioning. The installer (either `bsdinstall' (for stock >> FreeBSD), or `pc-sysinstall' (for PC-BSD / FreeNAS)) should create both >> primary (near start-of-disk) and backup (at end-of-disk) GPT tables, and >> install the appropriate bootstrap code in the proper locations. >> > > Yes, bsdinstall flawlessly creates both primary and backup GPT tables > even using disks > 2 TB, by virtue of the FreeBSD kernel. The problem > arises at the first stages of booting, when gptboot tries to compare > the primary and backup tables *using the BIOS disk services*, which > are not able to reach anything after the 2 TB limit. As a consequence > gptboot fails, stating that it did not find the GPT backup table. Maybe kern.geom.part.check_integrity=0 will allow it to boot. However, this sounds like a bug in gptboot. Maybe not easy to fix, but increasingly important as disks > 2TB become common. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 06:57:43 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1E3EE06 for ; Thu, 20 Nov 2014 06:57:43 +0000 (UTC) Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com [195.16.150.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8E10E86F for ; Thu, 20 Nov 2014 06:57:43 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop03.sare.net (Postfix) with ESMTPSA id 1A14A9DDF86; Thu, 20 Nov 2014 07:57:33 +0100 (CET) Subject: Re: BIOS booting from disks > 2TB Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=iso-8859-1 From: Borja Marcos In-Reply-To: Date: Thu, 20 Nov 2014 07:53:14 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> To: Warren Block X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 06:57:43 -0000 On Nov 19, 2014, at 6:29 PM, Warren Block wrote: > On Wed, 19 Nov 2014, Jos=E9 Mar=EDa Alcaide wrote: >=20 >> On Nov 19, 2014, at 8:06 AM, Pokala, Ravi wrote: >>=20 >>> When you perform your installation, just make sure to select the GPT >>> option for partitioning. The installer (either `bsdinstall' (for = stock >>> FreeBSD), or `pc-sysinstall' (for PC-BSD / FreeNAS)) should create = both >>> primary (near start-of-disk) and backup (at end-of-disk) GPT tables, = and >>> install the appropriate bootstrap code in the proper locations. >>>=20 >>=20 >> Yes, bsdinstall flawlessly creates both primary and backup GPT tables = even using disks > 2 TB, by virtue of the FreeBSD kernel. The problem = arises at the first stages of booting, when gptboot tries to compare the = primary and backup tables *using the BIOS disk services*, which are not = able to reach anything after the 2 TB limit. As a consequence gptboot = fails, stating that it did not find the GPT backup table. >=20 > Maybe kern.geom.part.check_integrity=3D0 will allow it to boot. = However, this sounds like a bug in gptboot. Maybe not easy to fix, but = increasingly important as disks > 2TB become common. I did a manual install on a 3 TB disk, creating a small partition for = the OS, around 4 GB. The booot sequence was: Attempting Boot =46rom Hard Drive (C:) gptboot: invalid backup GPT header BTX loader 1.00 BTW version is 1.02 Consoles: internal video/keyboard BIOS drive C: is disk0 BIOS 614kB/3961744kB available memory FreeBSD/x86 bootstrap loader, Revision 1.1 (root@releng1.nyi.freebsd.org...) Can't work out which disk we are booting from. Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0: can't load 'kernel' And that's it. It would be nice indeed if FreeBSD could boot from >2TB = disks on BIOS machines. What I wonder is, is this just some brain dead bug in this machine (HP = Proliant Microserver Gen8 with the latest BIOS version) or a widespread problem? It's not a pressing issue for myself, as anyway I intended to boot from = a memstick and use the disks just for a ZFS pool, but anyone trying to set up a ZFS on root boot will = run into problems. Borja. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 08:33:01 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 74E09CFD for ; Thu, 20 Nov 2014 08:33:01 +0000 (UTC) Received: from mail.jrv.org (rrcs-24-73-246-106.sw.biz.rr.com [24.73.246.106]) by mx1.freebsd.org (Postfix) with ESMTP id 6C21230C for ; Thu, 20 Nov 2014 08:32:59 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id 5377F1E9EA1; Thu, 20 Nov 2014 02:15:25 -0600 (CST) Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id HoEaqaieucBn; Thu, 20 Nov 2014 02:15:15 -0600 (CST) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.jrv.org (Postfix) with ESMTP id 5DE701E9E9C; Thu, 20 Nov 2014 02:15:15 -0600 (CST) X-Virus-Scanned: amavisd-new at zimbra64.housenet.jrv Received: from mail.jrv.org ([127.0.0.1]) by localhost (zimbra64.housenet.jrv [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ciNBKaAzY6mL; Thu, 20 Nov 2014 02:15:15 -0600 (CST) Received: from [192.168.138.128] (BMX.housenet.jrv [192.168.3.140]) by mail.jrv.org (Postfix) with ESMTPSA id 380F81E9E99; Thu, 20 Nov 2014 02:15:15 -0600 (CST) Message-ID: <546DA321.8050403@jrv.org> Date: Thu, 20 Nov 2014 02:15:29 -0600 From: "James R. Van Artsdalen" User-Agent: Mozilla/5.0 (Windows NT 5.0; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Borja Marcos Subject: Re: BIOS booting from disks > 2TB References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 08:33:01 -0000 The extended BIOS disk functions, introduced onto PCs almost 20 years ago= , allowing for addressing LBAs beyond 2TB. FreeBSD will use these BIOS f= unctions when present. This is usually not a problem. If a disk controller card of some kind is installed then the option ROM o= n that card must support the extended BIOS disk functions for this to wor= k. This is usually not a problem. The error messages shown only pertained to the backup header, not primary= , and looking at the code it implies to me that the primary header and ta= ble were read OK, and that these will be used even if the backup cannot b= e found. I think "invalid backup GPT header" is a warning in this case, = not a fatal error. I think the real problem is here: =20 Can't work out which disk we are booting from. Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0: 1. If you replace the 3TB disk with a disk 2TB or smaller and make no oth= er change, does this error still happen? 2. How are the disks connected to the system? What disk controllers are = used? What is the system BIOS boot disk setting set to? On 11/20/2014 12:53 AM, Borja Marcos wrote: > > On Nov 19, 2014, at 6:29 PM, Warren Block wrote: > >> On Wed, 19 Nov 2014, Jos=E9 Mar=EDa Alcaide wrote: >> >>> On Nov 19, 2014, at 8:06 AM, Pokala, Ravi wrote: >>> >>>> When you perform your installation, just make sure to select the GPT >>>> option for partitioning. The installer (either `bsdinstall' (for sto= ck >>>> FreeBSD), or `pc-sysinstall' (for PC-BSD / FreeNAS)) should create b= oth >>>> primary (near start-of-disk) and backup (at end-of-disk) GPT tables,= and >>>> install the appropriate bootstrap code in the proper locations. >>>> >>> >>> Yes, bsdinstall flawlessly creates both primary and backup GPT tables= even using disks > 2 TB, by virtue of the FreeBSD kernel. The problem ar= ises at the first stages of booting, when gptboot tries to compare the pr= imary and backup tables *using the BIOS disk services*, which are not abl= e to reach anything after the 2 TB limit. As a consequence gptboot fails,= stating that it did not find the GPT backup table. >> >> Maybe kern.geom.part.check_integrity=3D0 will allow it to boot. Howev= er, this sounds like a bug in gptboot. Maybe not easy to fix, but increa= singly important as disks > 2TB become common. > > I did a manual install on a 3 TB disk, creating a small partition for t= he OS, around 4 GB. > > The booot sequence was: > > Attempting Boot From Hard Drive (C:) > gptboot: invalid backup GPT header > > BTX loader 1.00 BTW version is 1.02 > Consoles: internal video/keyboard > BIOS drive C: is disk0 > BIOS 614kB/3961744kB available memory > > FreeBSD/x86 bootstrap loader, Revision 1.1 > (root@releng1.nyi.freebsd.org...) > Can't work out which disk we are booting from. > Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0= : > > can't load 'kernel' > > > > And that's it. It would be nice indeed if FreeBSD could boot from >2TB = disks on BIOS machines. What > I wonder is, is this just some brain dead bug in this machine (HP Proli= ant Microserver Gen8 with the latest > BIOS version) or a widespread problem? > > It's not a pressing issue for myself, as anyway I intended to boot from= a memstick and use the disks > just for a ZFS pool, but anyone trying to set up a ZFS on root boot wil= l run into problems. > > > > > > > > Borja. > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 08:43:21 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76352F6D for ; Thu, 20 Nov 2014 08:43:21 +0000 (UTC) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 31C995EE for ; Thu, 20 Nov 2014 08:43:20 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id 3DCAA9DF0C9; Thu, 20 Nov 2014 09:43:12 +0100 (CET) Subject: Re: BIOS booting from disks > 2TB Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=iso-8859-1 From: Borja Marcos In-Reply-To: <546DA321.8050403@jrv.org> Date: Thu, 20 Nov 2014 09:43:10 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <546DA321.8050403@jrv.org> To: "James R. Van Artsdalen" X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 08:43:21 -0000 On Nov 20, 2014, at 9:15 AM, James R. Van Artsdalen wrote: >=20 > The extended BIOS disk functions, introduced onto PCs almost 20 years = ago, allowing for addressing LBAs beyond 2TB. FreeBSD will use these = BIOS functions when present. This is usually not a problem. Yes, that I was assuming. Something is wrong at least in this machine, = though. > If a disk controller card of some kind is installed then the option = ROM on that card must support the extended BIOS disk functions for this = to work. This is usually not a problem. Aha, so definitely HP blew it with the disk controller (see below) > The error messages shown only pertained to the backup header, not = primary, and looking at the code it implies to me that the primary = header and table were read OK, and that these will be used even if the = backup cannot be found. I think "invalid backup GPT header" is a = warning in this case, not a fatal error. Yes, I see the boot progressed beyond that error. > I think the real problem is here: >=20 > Can't work out which disk we are booting from. > Guessed BIOS device 0xffffffff not found by probes, defaulting to = disk0: >=20 > 1. If you replace the 3TB disk with a disk 2TB or smaller and make no = other change, does this error still happen? Yes. I tried a 1 TB disk and it worked without problems. The other = difference between them is the "advanced format". The 1 TB disk has normal 512 byte sectors (it's a two years old Samsung, I don't have = the model number handy), and the 3 TB disk is a WD Red 3 TB disk which has 4 KB sectors. > 2. How are the disks connected to the system? What disk controllers = are used? What is the system BIOS boot disk setting set to? The disk controller is one of those "array controllers" but it can be = configured to work as a stock AHCI one, which is the mode in which I am using it of course. It works flawlessly as an AHCI = controller under FreeBSD, for example with ZFS. However, maybe HP has assumed that everyone will use it in "intelligent" = mode and they haven't implemented the BIOS code for it correctly, for example missing the BIOS extensions. So, can we assume that it's not a problem with the FreeBSD boot chain, = but definitely a poorly implemented BIOS? Thanks! Borja. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 09:11:24 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2502E8A4 for ; Thu, 20 Nov 2014 09:11:24 +0000 (UTC) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D4433943 for ; Thu, 20 Nov 2014 09:11:23 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop04.sare.net (Postfix) with ESMTPSA id 7B39A9E0006; Thu, 20 Nov 2014 10:11:21 +0100 (CET) Subject: Re: BIOS booting from disks > 2TB Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: Date: Thu, 20 Nov 2014 10:11:20 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <3847C9EE-ABE1-4C7D-AA24-318116B0BE8E@sarenet.es> References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <546DA321.8050403@jrv.org> To: Borja Marcos X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 09:11:24 -0000 On Nov 20, 2014, at 9:43 AM, Borja Marcos wrote: > However, maybe HP has assumed that everyone will use it in = "intelligent" mode and they haven't implemented the BIOS > code for it correctly, for example missing the BIOS extensions. >=20 > So, can we assume that it's not a problem with the FreeBSD boot chain, = but definitely a poorly implemented BIOS? Replying to myself, a friend tried the same configuration with a MBR = partition table and he says it worked. I'll try that this evening and report the results. Borja. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 09:39:11 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07F74350 for ; Thu, 20 Nov 2014 09:39:11 +0000 (UTC) Received: from lucifer.we.lc.ehu.es (lucifer.we.lc.ehu.es [158.227.6.50]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "lucifer.we.lc.ehu.es", Issuer "CA Dpto Electricidad y Electronica" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 83E96B97 for ; Thu, 20 Nov 2014 09:39:09 +0000 (UTC) Received: from ncc-1701.we.lc.ehu.es (ncc-1701.we.lc.ehu.es [158.227.6.85]) (authenticated bits=0) by lucifer.we.lc.ehu.es (8.13.1/8.13.1) with ESMTP id sAK9cx8w002627 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 20 Nov 2014 10:39:00 +0100 (CET) (envelope-from jose@we.lc.ehu.es) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: BIOS booting from disks > 2TB From: =?iso-8859-1?Q?Jos=E9_Mar=EDa_Alcaide?= In-Reply-To: <546DA321.8050403@jrv.org> Date: Thu, 20 Nov 2014 10:38:58 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <546DA321.8050403@jrv.org> To: "James R. Van Artsdalen" X-Mailer: Apple Mail (2.1993) X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (lucifer.we.lc.ehu.es [158.227.6.50]); Thu, 20 Nov 2014 10:39:00 +0100 (CET) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 09:39:11 -0000 > El 20/11/2014, a las 9:15, James R. Van Artsdalen = escribi=F3: >=20 >=20 > The extended BIOS disk functions, introduced onto PCs almost 20 years = ago, allowing for addressing LBAs beyond 2TB. FreeBSD will use these = BIOS functions when present. This is usually not a problem. >=20 > If a disk controller card of some kind is installed then the option = ROM on that card must support the extended BIOS disk functions for this = to work. This is usually not a problem. >=20 > The error messages shown only pertained to the backup header, not = primary, and looking at the code it implies to me that the primary = header and table were read OK, and that these will be used even if the = backup cannot be found. I think "invalid backup GPT header" is a = warning in this case, not a fatal error. >=20 > I think the real problem is here: >=20 > Can't work out which disk we are booting from. > Guessed BIOS device 0xffffffff not found by probes, defaulting to = disk0: >=20 > 1. If you replace the 3TB disk with a disk 2TB or smaller and make no = other change, does this error still happen? >=20 > 2. How are the disks connected to the system? What disk controllers = are used? What is the system BIOS boot disk setting set to? >=20 I have an identical system (Proliant Microserver Gen8) with same = hardware configuration, same firmware versions and same disks (WD Red 3 = TB) as Borja's system. When the disk controller is configured in AHCI mode there is no way of = selecting the boot disk (among those connected to the disk controller). = The BIOS only permits to select the boot disk controller, if there is = more than one (which is not the case). So I think that the message Can't work out which disk we are booting from. Guessed BIOS device 0xffffffff not found by probes, defaulting to = disk0: is harmless (given that we actually want to boot from the first disk). In fact I did another test: I installed FreeBSD on an 8 GB partition = *using the MBR scheme instead of GPT*, and the system booted from the = first 3 TB disk without any problem (and with four disks attached), = despite of showing the same warning message. -- Jose M. Alcaide= From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 09:50:37 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2AE4C889 for ; Thu, 20 Nov 2014 09:50:37 +0000 (UTC) Received: from cu01176a.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D9DA4CD1 for ; Thu, 20 Nov 2014 09:50:35 +0000 (UTC) Received: from [172.16.2.2] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id 3369B9DD957; Thu, 20 Nov 2014 10:50:26 +0100 (CET) Subject: Re: BIOS booting from disks > 2TB Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=iso-8859-1 From: Borja Marcos In-Reply-To: Date: Thu, 20 Nov 2014 10:50:24 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <7CAD9C0C-E793-4DA8-8ED0-AAB01C77F52C@sarenet.es> References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <546DA321.8050403@jrv.org> To: =?iso-8859-1?Q?Jos=E9_Mar=EDa_Alcaide?= X-Mailer: Apple Mail (2.1283) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 09:50:37 -0000 On Nov 20, 2014, at 10:38 AM, Jos=E9 Mar=EDa Alcaide wrote: > Can't work out which disk we are booting from. > Guessed BIOS device 0xffffffff not found by probes, defaulting to = disk0: >=20 > is harmless (given that we actually want to boot from the first disk). >=20 > In fact I did another test: I installed FreeBSD on an 8 GB partition = *using the MBR scheme instead of GPT*, and the system booted from the = first 3 TB disk without any problem (and with four disks attached), = despite of showing the same warning message. So, that 0xffffffff might be a buffer overflow being triggered by a = failed attempt to read the backup GPT table? Let's assume that the BIOS is poorly implemented and it won't read = beyond the 2 TB limit. As far as I know, booting from a MBR disk doesn't require reading = anything but the "classic" partition table and the partition we are using. So, as long as the partition fits inside that 2 TB limit it = should work, and it does. Booting from GPT, however, requires reading the end of the disk. Or is = the backup copy of the partition table read if and only if there's some problem with the main one?=20 Can BIOS be reporting a wrong size for the disk (after all we are = assuming a dodgy BIOS) and making gptboot to cosider the table corrupt, hence causing it to try to read the backup copy? Whatever, that 0xffffffff parameter (which should be something like = 0x80) looks like a corrupted variable to me, which would mean we have some buffer overflow? Maybe I'll try to recompile the boot chain removing the backup reading = and the sanity checks and see what happens. Borja. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 10:46:23 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B1CF4B90 for ; Thu, 20 Nov 2014 10:46:23 +0000 (UTC) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.21.123]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "smtp-sofia.digsys.bg", Issuer "Digital Systems Operational CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DCFB5FD for ; Thu, 20 Nov 2014 10:46:22 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [193.68.6.1]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.6/8.14.6) with ESMTP id sAKAXava024662 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Thu, 20 Nov 2014 12:33:37 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <546DC380.1040707@digsys.bg> Date: Thu, 20 Nov 2014 12:33:36 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS and glabel References: <1422065A4E115F409E22C1EC9EDAFBA4220D0DB7@sofdc01exc02.postbank.bg> In-Reply-To: <1422065A4E115F409E22C1EC9EDAFBA4220D0DB7@sofdc01exc02.postbank.bg> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 10:46:23 -0000 Hi Ivailo, The FreeBSD glabel is in bit of a mess, indeed. It is a mess not because the tech is bad or buggy (although there are caveats), but because the glabel tool had made it all too confusing by displaying them all together. Or perhaps our assumptions are wrong if one needs to be more precise. We have several kinds of labels. Each of them lives under it's own namespace (a subdir of /dev). There are the glabel type, which you manipulate with glabel - this lives under /dev/label. There are the geom labels that you manipulate via gpart that live under /dev/gpt. There are the gmirror labels that you manipulate with gmirror and live under /dev/mirror. There are the disk ID labels that live under /dev/diskid. There are the UFS labels that you manipulate with newfs/tunefs that live under /dev/ufs. Perhaps there are others I missed.. Then comes ZFS. For it's own sanity, ZFS would label the devices it's given with it's own labels -- so that when you reboot or move the pool to another machine it still finds it's members and structure. If it can find its own labels, that is... As a consequence of this, the safest way to use ZFS is with whole devices. This pretty much guarantees your ZFS pool will be portable across any system and ZFS will *always* be able to find it, no matter what. The drawback is you might not know for sure which device id is which physical drive, because many factors might influence device name reordering. But this is pretty much the only drawback. The diskid should work in a similar way. On systems that don't have disk ids, you will fall back to the device name, so no big deal. The next "safest" thing is the GPT label, which you create with gpart. Many systems (non FreeBSD) support it and your pool will be just fine there. Worst are glabel and gmirror, mostly because they have trouble being nested. But as long as you stick to some simple rules, these work ok too. What you are seeing is when you destroy the label, ZFS can no longer find it's own labels. This is because when you destroy the label ZFS has no idea w where to look for it -- what the offset would be. If in your example, you recreate the label again, that pool will suddenly work again -- even if you use different name for the new label -- the ZFS's own label will be then discoverable again. I myself prefer either raw disks or GPT. The later especially in smaller systems, where I would use GPT for boot partitions anyway. But also on systems with tens of drives, where I need to know the physical location of the drive (and not care much about it's serial number at that moment, which would be the case of using diskid labels). On these systems, I would label the GPT partition with chasis/position name. By the way, I still have few systems that use glabels (dev/label). Daniel On 17.11.14 14:42, Ivailo A. Tanusheff wrote: > Dear all, > > I run to an interesting issue and I would like to discuss it with all of you. > The whole thing began with me trying to identify available HDD to include in a zfs pool through a script/program. > I assumed that the easiest way of doing this is using glabel. For example: > > root@FreeBSD:~ # glabel status > Name Status Components > gptid/248e758c-e267-11e3-95bb-08002796202b N/A ada0p1 > diskid/DISK-VBdd471206-91164057 N/A ada5 > diskid/DISK-VBe98b5e75-0d8cf6dc N/A ada8 > diskid/DISK-VB7d006584-01beca12 N/A ada6 > diskid/DISK-VB721029c3-66a60156 N/A ada7 > diskid/DISK-VB31481dbb-639540a1 N/A ada2 > diskid/DISK-VB95921208-4eb19f41 N/A ada4 > > So far it is OK and if I create pool like zpool create xxx ada4 then the line for ada4 will disappear from the glabel status. > As far as I remember though it is not recommended to use production pools based on the device naming, so I wanted to switch to gpt lable, i.e. diskid/DISK-VB95921208-4eb19f41. > When I recreate pool like: > zpool create xxx diskid/DISK-VB95921208-4eb19f41 the pool is created without problems, but the device does not disappear from the glabel status list, thus making my program running wrong. > Is this a problem with the zfs implementation, my server or the general idea is wrong? > > BTW, if I label the disk additionally, like: > glabel create VB95921208-4eb19f41 ada4 > zpool create xxx label/VB95921208-4eb19f41 > > The glabel status again shows the right information. The problem with the latest approach is that if someone executes: > glabel destroy -f VB95921208-4eb19f41 > > The result becomes: > pool: xxx > state: UNAVAIL > status: One or more devices are faulted in response to IO failures. > action: Make sure the affected devices are connected, then run 'zpool clear'. > see: http://illumos.org/msg/ZFS-8000-HC > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > xxx UNAVAIL 0 0 0 > 6968348230421469155 REMOVED 0 0 0 was /dev/label/VB95921208-4eb19f41 > > And the data is practically unrecoverable. > > So my questions are: > - Is there a way to make glabel to show the right data when I use diskid/DISK-VB95921208-4eb19f41 > - Which is the most proper way of creating vdevs - with disk name (ada4), diskid (diskid/DISK-VB95921208-4eb19f41) or manual labeling? > - How may I found which disks are free, if the diskid approach is the best solution? > > > Regards, > > Ivailo Tanusheff > > Disclaimer: > > This communication is confidential. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by mistake, please notify us immediately by responding to this email and then delete it from your system. > Eurobank Bulgaria AD is not responsible for, nor endorses, any opinion, recommendation, conclusion, solicitation, offer or agreement or any information contained in this communication. > Eurobank Bulgaria AD cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. If you suspect that the message may have been intercepted or amended, please call the sender. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 17:57:07 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 60F432C5 for ; Thu, 20 Nov 2014 17:57:07 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3927EF12 for ; Thu, 20 Nov 2014 17:57:07 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 319ADB97B; Thu, 20 Nov 2014 12:57:06 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Subject: Re: BIOS booting from disks > 2TB Date: Thu, 20 Nov 2014 11:10:44 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <7CAD9C0C-E793-4DA8-8ED0-AAB01C77F52C@sarenet.es> In-Reply-To: <7CAD9C0C-E793-4DA8-8ED0-AAB01C77F52C@sarenet.es> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201411201110.45066.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 20 Nov 2014 12:57:06 -0500 (EST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 17:57:07 -0000 On Thursday, November 20, 2014 4:50:24 am Borja Marcos wrote: >=20 > On Nov 20, 2014, at 10:38 AM, Jos=E9 Mar=EDa Alcaide wrote: >=20 > > Can't work out which disk we are booting from. > > Guessed BIOS device 0xffffffff not found by probes, defaulting to disk= 0: > >=20 > > is harmless (given that we actually want to boot from the first disk). > >=20 > > In fact I did another test: I installed FreeBSD on an 8 GB partition *u= sing the MBR scheme instead of GPT*, and the system booted from the first 3= TB disk=20 without any problem (and with four disks attached), despite of showing the = same warning message. >=20 > So, that 0xffffffff might be a buffer overflow being triggered by a faile= d attempt to read the backup GPT table? No. Anytime the early boot stage doesn't like the drive the loader ends up using a device number of -1. It's not really an overflow, but an error indicator. > Let's assume that the BIOS is poorly implemented and it won't read beyond= the 2 TB limit. That would be really odd since EDD has existed and supported 64-bit LBAs si= nce 1995. > As far as I know, booting from a MBR disk doesn't require reading anythi= ng but the "classic" partition table and the partition we > are using. So, as long as the partition fits inside that 2 TB limit it sh= ould work, and it does. Booting requires reading files like /boot/loader which can be anywhere on t= he disk. However, MBR partitions are limited to 32-bit LBAs, so your filesyst= em is similarly limited. > Booting from GPT, however, requires reading the end of the disk. Or is th= e backup copy of the partition table read if and only if > there's some problem with the main one?=20 Booting requires more than reading tables, it requires reading files and th= ose can be anywhere. > Can BIOS be reporting a wrong size for the disk (after all we are assumin= g a dodgy BIOS) and making gptboot to cosider the > table corrupt, hence causing it to try to read the backup copy? >=20 > Whatever, that 0xffffffff parameter (which should be something like 0x80)= looks like a corrupted variable to me, which would mean > we have some buffer overflow? No, as I said above, it is an error indicator, not an overflow. > Maybe I'll try to recompile the boot chain removing the backup reading an= d the sanity checks and see what happens. Can you start with 'lsdev -v' at the loader prompt? =2D-=20 John Baldwin From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 19:08:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 531FB662; Thu, 20 Nov 2014 19:08:59 +0000 (UTC) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 102BBC52; Thu, 20 Nov 2014 19:08:58 +0000 (UTC) Received: from www.saremail.com (unknown [194.30.0.100]) by proxypop04.sare.net (Postfix) with ESMTPSA id CFFA59DE4D4; Thu, 20 Nov 2014 20:08:54 +0100 (CET) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Thu, 20 Nov 2014 20:08:54 +0100 From: borjam@sarenet.es To: John Baldwin Subject: Re: BIOS booting from disks > 2TB In-Reply-To: <201411201110.45066.jhb@freebsd.org> References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <7CAD9C0C-E793-4DA8-8ED0-AAB01C77F52C@sarenet.es> <201411201110.45066.jhb@freebsd.org> Message-ID: X-Sender: borjam@sarenet.es User-Agent: Saremail/0.8 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 19:08:59 -0000 El 20.11.2014 17:10, John Baldwin escribió: >> So, that 0xffffffff might be a buffer overflow being triggered by a >> failed attempt to read the backup GPT table? > > No. Anytime the early boot stage doesn't like the drive the loader > ends up > using a device number of -1. It's not really an overflow, but an error > indicator. Oh, sorry. Understood. >> Let's assume that the BIOS is poorly implemented and it won't read >> beyond the 2 TB limit. > > That would be really odd since EDD has existed and supported 64-bit > LBAs since > 1995. Don't underestimate the terrific capability for surrealism of some peecee makers... ;) >> As far as I know, booting from a MBR disk doesn't require reading >> anything but the "classic" partition table and the partition we >> are using. So, as long as the partition fits inside that 2 TB limit it >> should work, and it does. > > Booting requires reading files like /boot/loader which can be anywhere > on the > disk. However, MBR partitions are limited to 32-bit LBAs, so your > filesystem > is similarly limited. As a matter of fact, I tried creating a MBR partition table and it worked. It just fails with the GPT. >> Booting from GPT, however, requires reading the end of the disk. Or is >> the backup copy of the partition table read if and only if >> there's some problem with the main one? > > Booting requires more than reading tables, it requires reading files > and those > can be anywhere. Yes, I understand they can be anywhere of course. > >> Can BIOS be reporting a wrong size for the disk (after all we are >> assuming a dodgy BIOS) and making gptboot to cosider the >> table corrupt, hence causing it to try to read the backup copy? >> >> Whatever, that 0xffffffff parameter (which should be something like >> 0x80) looks like a corrupted variable to me, which would mean >> we have some buffer overflow? > > No, as I said above, it is an error indicator, not an overflow. I stand corrected :) > Can you start with 'lsdev -v' at the loader prompt? Sure! cd devices: disk devices: disk0: BIOS drive C: pxe devices: ls doesn't see anything. I assume that ls disk0:/ should have worked, right? I tried setting root_disk_unit to 0, 1, 2... and ls doesn't see anything. I must admit, to my shame (I was a member of the Forth Interest Group 20 years ago!) that I am completely lost with the loader. For me it's always been a matter of install and forget ;) Borja. From owner-freebsd-fs@FreeBSD.ORG Thu Nov 20 21:33:15 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07DDA390 for ; Thu, 20 Nov 2014 21:33:15 +0000 (UTC) Received: from cu01176a.smtpx.saremail.com (cu01176a.smtpx.saremail.com [195.16.150.151]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B8E14F7A for ; Thu, 20 Nov 2014 21:33:14 +0000 (UTC) Received: from www.saremail.com (unknown [194.30.0.100]) by proxypop03.sare.net (Postfix) with ESMTPSA id BFF659DE138 for ; Thu, 20 Nov 2014 22:33:10 +0100 (CET) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Thu, 20 Nov 2014 22:33:10 +0100 From: borjam@sarenet.es To: freebsd-fs@freebsd.org Subject: Re: BIOS booting from disks > 2TB In-Reply-To: References: <17A2AC72-AD70-480A-9BAC-9CC8EAFD572F@we.lc.ehu.es> <7CAD9C0C-E793-4DA8-8ED0-AAB01C77F52C@sarenet.es> <201411201110.45066.jhb@freebsd.org> Message-ID: <4de73204bc0459a97d6c076bdacbed86@sarenet.es> X-Sender: borjam@sarenet.es User-Agent: Saremail/0.8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Nov 2014 21:33:15 -0000 El 20.11.2014 20:08, borjam@sarenet.es escribió: >> Booting requires more than reading tables, it requires reading files >> and those >> can be anywhere. > > Yes, I understand they can be anywhere of course. This is the GPT table. I booted from the memstick again: # gpart show ada0 => 34 5860533101 ada0 GPT (2.7T) 34 6 - free - (3.0K) 40 1024 1 freebsd-boot (512K) 1064 67108864 2 freebsd-ufs (32G) 67109928 5793423207 - free - (2.7T) If the main table was broken it would complain, right? Borja. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 00:14:25 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76E233C3; Fri, 21 Nov 2014 00:14:25 +0000 (UTC) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1DA0C31C; Fri, 21 Nov 2014 00:14:24 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1Xrbrd-0005mq-5K; Fri, 21 Nov 2014 01:14:22 +0100 Content-Type: multipart/mixed; boundary=----------JxeXu0Csk7rvM8RhMGHkI1 To: "Konstantin Belousov" , "Ian Lepore" Subject: Re: panic in nfs on arm References: <1388627434.7506173.1414279273153.JavaMail.root@uoguelph.ca> <20141026075720.GO1877@kib.kiev.ua> <1414335557.12052.672.camel@revolution.hippie.lan> Date: Fri, 21 Nov 2014 01:14:16 +0100 MIME-Version: 1.0 From: "Ronald Klop" Message-ID: In-Reply-To: <1414335557.12052.672.camel@revolution.hippie.lan> User-Agent: Opera Mail/12.16 (FreeBSD) X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: -0.2 X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED, BAYES_50 autolearn=disabled version=3.3.2 X-Scan-Signature: e919acc199487664c4b881d73fbb3695 Cc: freebsd-fs@freebsd.org, freebsd-arm@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 00:14:25 -0000 ------------JxeXu0Csk7rvM8RhMGHkI1 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit On Sun, 26 Oct 2014 15:59:17 +0100, Ian Lepore wrote: > On Sun, 2014-10-26 at 09:57 +0200, Konstantin Belousov wrote: >> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote: >> > Ronald Klop wrote: >> > > Hi, >> > > >> > > I got a panic on my arm computer while building a port with >> > > /usr/ports >> > > mounted from my FreeBSD-10-STABLE/amd64 machine. >> > > >> > > This is the machine which paniced: >> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014 >> > > root@sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG >> > > arm >> > > >> > > >> > > Tracing pid 90295 tid 100119 td 0xc5f8c960 >> > > db_trace_self() at db_trace_self >> > > pc = 0xc0bb12c8 lr = 0xc0bb1354 (db_trace_thread+0x50) >> > > sp = 0xdf29e5d0 fp = 0xc3e07120 >> > > db_trace_thread() at db_trace_thread+0x50 >> > > pc = 0xc0bb1354 lr = 0xc0936314 (db_command_init+0x5a4) >> > > sp = 0xdf29e630 fp = 0xc3e07120 >> > > db_command_init() at db_command_init+0x5a4 >> > > pc = 0xc0936314 lr = 0xc0935ad0 (db_skip_to_eol+0x484) >> > > sp = 0xdf29e648 fp = 0xc3e07120 >> > > r4 = 0xc0c8d350 r5 = 0x00000000 >> > > db_skip_to_eol() at db_skip_to_eol+0x484 >> > > pc = 0xc0935ad0 lr = 0xc0935c38 (db_command_loop+0x5c) >> > > sp = 0xdf29e6e8 fp = 0xc3e07120 >> > > r4 = 0xdf29e6fc r5 = 0xc0c8d64c >> > > r6 = 0x3cd90e75 r7 = 0x00000000 >> > > r8 = 0x00000001 r10 = 0x600000d3 >> > > db_command_loop() at db_command_loop+0x5c >> > > pc = 0xc0935c38 lr = 0xc0937f80 (X_db_sym_numargs+0xec) >> > > sp = 0xdf29e6f0 fp = 0xc3e07120 >> > > X_db_sym_numargs() at X_db_sym_numargs+0xec >> > > pc = 0xc0937f80 lr = 0xc0a6f0c0 (kdb_trap+0x94) >> > > sp = 0xdf29e808 fp = 0xc3e07120 >> > > r4 = 0xdf29e8f8 >> > > kdb_trap() at kdb_trap+0x94 >> > > pc = 0xc0a6f0c0 lr = 0xc0bc1d60 (badaddr_read+0x274) >> > > sp = 0xdf29e828 fp = 0xc3e07120 >> > > r4 = 0xdf29e8f8 r5 = 0x00000001 >> > > r6 = 0x3cd90e75 r7 = 0xc5f8c960 >> > > r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0 >> > > badaddr_read() at badaddr_read+0x274 >> > > pc = 0xc0bc1d60 lr = 0xc0bc1e98 (badaddr_read+0x3ac) >> > > sp = 0xdf29e840 fp = 0xc3e07120 >> > > r4 = 0xc5f8c960 r5 = 0xdf29e8f8 >> > > r6 = 0x3cd90e05 >> > > badaddr_read() at badaddr_read+0x3ac >> > > pc = 0xc0bc1e98 lr = 0xc0bc2278 >> (data_abort_handler+0x10c) >> > > sp = 0xdf29e858 fp = 0xc3e07120 >> > > r4 = 0xc0cd8af8 r5 = 0xffff1004 >> > > data_abort_handler() at data_abort_handler+0x10c >> > > pc = 0xc0bc2278 lr = 0xc0bb2f40 (exception_exit) >> > > sp = 0xdf29e8f8 fp = 0xc3e07120 >> > > r4 = 0xffffffff r5 = 0xffff1004 >> > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 >> > > r8 = 0x0000000f r9 = 0x00000101 >> > > r10 = 0x0000001d >> > > exception_exit() at exception_exit >> > > pc = 0xc0bb2f40 lr = 0xc0b8daf8 (uma_reclaim+0x1f8) >> > > sp = 0xdf29e948 fp = 0xc3e07120 >> > > r0 = 0xba9b9127 r1 = 0x8b3de5fb >> > > r2 = 0xc61c1fc8 r3 = 0xba9b9126 >> > > r4 = 0x00000000 r5 = 0xc61c1fc8 >> > > r6 = 0x3cd90e05 r7 = 0xc0e0ea48 >> > > r8 = 0x0000000f r9 = 0x00000101 >> > > r10 = 0x0000001d r12 = 0x00000000 >> > > uma_reclaim() at uma_reclaim+0x24c >> > This looks to me like a crash in uma_reclaim() and I find UMA >> > way too obscure to understand. >> > >> > I have no idea if it might be related, but alc@ put a fix for low >> > memory situations in r272071 (or maybe it's r272221?). >> > >> > Might be worth trying a slightly newer kernel to see if the >> > problem still occurs. >> > >> > And hopefully someone more conversant with UMA (or this stack >> > trace) can help more. >> > >> > rick >> > >> > > pc = 0xc0b8db4c lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0) >> > > sp = 0xdf29e978 fp = 0xdf29ec10 >> > > r4 = 0xc3e071d8 r5 = 0xc0e0ea00 >> > > r6 = 0xc3e07120 r7 = 0x00000000 >> > > r8 = 0x00000102 r9 = 0xdf29ecf8 >> > > r10 = 0xc61c0760 >> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0 >> uma_reclaim() is not called from uma_zalloc(). >> I think there is some issue with ddb on arm, which means that >> the backtrace is not useful. See below for one more. >> > > pc = 0xc0b8c800 lr = 0xc09e1df0 (nfscl_nget+0x308) >> > > sp = 0xdf29e990 fp = 0xdf29ec10 >> > > r4 = 0x9bb9fa43 r5 = 0x00000000 >> > > r6 = 0xc550dce8 r7 = 0xc3edaa00 >> > > r8 = 0xc3ebbac0 >> > > nfscl_nget() at nfscl_nget+0x308 >> > > pc = 0xc09e1df0 lr = 0xc09da69c (ncl_readlinkrpc+0xf60) >> > > sp = 0xdf29e9d8 fp = 0xdf29ea10 >> > > r4 = 0xc550dce8 r5 = 0x00000000 >> > > r6 = 0xc550dcf8 r7 = 0xdf29ecf8 >> > > r8 = 0xdf29ec6c r9 = 0x00000000 >> > > r10 = 0xdf29ed28 >> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60 >> > > pc = 0xc09da69c lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94) >> > > sp = 0xdf29ec40 fp = 0xbffff620 >> > > r4 = 0xc0c95c68 r5 = 0xdf29ec6c >> > > r6 = 0x00000001 r7 = 0x00020284 >> > > r8 = 0xffffff9c r9 = 0x00200800 >> > > r10 = 0xc5f8c960 >> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94 >> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(), >> esp. without intervening frame. >> > > Notice that the address is actually ncl_readlinkrpc+0xf60, 0xf60 is a > pretty big offset into a function, it's probably in some static function > that follows ncl_readlinkrpc in the source file but the symbol info has > been stripped. Using addr2line on the pc and lr values will give > reliable source line numbers (but I can't do that without Ronald's > kernel config). > > -- Ian Sorry for the late reply and I don't know if it is very interesting (or high prio) still. Attached my kernel config and the diff of my source tree. Ronald. > > > >> > > pc = 0xc0bdae44 lr = 0xc0aca614 (kern_mkdirat+0x18c) >> > > sp = 0xdf29ec50 fp = 0xbffff620 >> > > r4 = 0xdf29ed28 r5 = 0xdf29ec90 >> > > r6 = 0x00000000 >> > > kern_mkdirat() at kern_mkdirat+0x18c >> > > pc = 0xc0aca614 lr = 0xc0aca684 (kern_mkdir+0x24) >> > > sp = 0xdf29ede0 fp = 0xbffff620 >> > > r4 = 0x00020290 r5 = 0xc5f8c960 >> > > r6 = 0x00000000 r7 = 0xc5f7f000 >> > > r8 = 0x00000000 r10 = 0x00013640 >> > > kern_mkdir() at kern_mkdir+0x24 >> > > pc = 0xc0aca684 lr = 0xc0aca6a8 (sys_mkdir+0x1c) >> > > sp = 0xdf29edf0 fp = 0xbffff620 >> > > sys_mkdir() at sys_mkdir+0x1c >> > > pc = 0xc0aca6a8 lr = 0xc0bc2884 (swi_handler+0x254) >> > > sp = 0xdf29edf8 fp = 0xbffff620 >> > > swi_handler() at swi_handler+0x254 >> > > pc = 0xc0bc2884 lr = 0xc0bb2ed0 (swi_exit) >> > > sp = 0xdf29ee60 fp = 0xbffff620 >> > > r4 = 0x00020290 r5 = 0x2085e8e0 >> > > r6 = 0x00020284 r7 = 0x00000088 >> > > r8 = 0x00000001 >> > > swi_exit() at swi_exit >> > > pc = 0xc0bb2ed0 lr = 0xc0bb2ed0 (swi_exit) >> > > sp = 0xdf29ee60 fp = 0xbffff620 >> > > Unable to unwind further >> > > >> > > >> > > Unfortunately dumping the kernel core also paniced. >> > > db> dump >> > > Physical memory: 507 MB >> > > Dumping 74 MB: 71 67 63 >> > > vm_fault(0xc4147000, 0, 1, 0) -> 0 >> > > Fatal kernel mode data abort: 'Translation Fault (P)' >> > > trapframe: 0xdf29e0b8 >> > > FSR=00000017, FAR=00000014, spsr=a00000d3 >> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004 >> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c >> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a >> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060 >> > > >> > > panic: Fatal abort >> > > Uptime: 3d18h30m32s >> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock > > > _______________________________________________ > freebsd-arm@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arm > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" ------------JxeXu0Csk7rvM8RhMGHkI1 Content-Disposition: attachment; filename=SHEEVAPLUG Content-Type: application/octet-stream; name=SHEEVAPLUG Content-Transfer-Encoding: Base64 IwojIEN1c3RvbSBrZXJuZWwgZm9yIE1hcnZlbGwgU2hlZXZhUGx1ZyBkZXZpY2Vz LgojCiMgJEZyZWVCU0Q6IGhlYWQvc3lzL2FybS9jb25mL1NIRUVWQVBMVUcgMjYz MzAxIDIwMTQtMDMtMTggMTQ6NDE6MThaIGltcCAkCiMKCmlkZW50CQlTSEVFVkFQ TFVHCmluY2x1ZGUJCSIuLi9tdi9raXJrd29vZC9zdGQuZGI4OGY2eHh4IgoKb3B0 aW9ucyAJU09DX01WX0tJUktXT09ECm1ha2VvcHRpb25zCU1PRFVMRVNfT1ZFUlJJ REU9IiIKCm1ha2VvcHRpb25zCURFQlVHPS1nCQkjIEJ1aWxkIGtlcm5lbCB3aXRo IGdkYigxKSBkZWJ1ZyBzeW1ib2xzCiNvcHRpb25zCQlJTlZBUklBTlRTCiNvcHRp b25zCQlJTlZBUklBTlRfU1VQUE9SVAojb3B0aW9ucwkJV0lUTkVTUwojb3B0aW9u cwkJREVCVUdfTE9DS1MKI29wdGlvbnMJCURFQlVHX1ZGU19MT0NLUwojb3B0aW9u cwkJRElBR05PU1RJQwptYWtlb3B0aW9ucwlXRVJST1I9Ii1XZXJyb3IiCm9wdGlv biBLU1RBQ0tfUEFHRVM9NQpvcHRpb24gQVJNX0RFVklDRV9NVUxUSVBBU1MKCm9w dGlvbnMgCVNDSEVEXzRCU0QJCSMgNEJTRCBzY2hlZHVsZXIKb3B0aW9ucyAJSU5F VAkJCSMgSW50ZXJORVR3b3JraW5nCiNvcHRpb25zIAlJTkVUNgkJCSMgSVB2NiBj b21tdW5pY2F0aW9ucyBwcm90b2NvbHMKb3B0aW9ucyAJR0VPTV9QQVJUX0JTRAkJ IyBCU0QgcGFydGl0aW9uIHNjaGVtZQpvcHRpb25zIAlHRU9NX1BBUlRfTUJSCQkj IE1CUiBwYXJ0aXRpb24gc2NoZW1lCm9wdGlvbnMgCVRNUEZTCQkJIyBFZmZpY2ll bnQgbWVtb3J5IGZpbGVzeXN0ZW0Kb3B0aW9ucyAJRkZTCQkJIyBCZXJrZWxleSBG YXN0IEZpbGVzeXN0ZW0Kb3B0aW9ucyAJTkFOREZTCQkJIyBOQU5EIEZpbGVzeXN0 ZW0Kb3B0aW9ucyAJTkZTQ0wJCQkjIE5ldyBOZXR3b3JrIEZpbGVzeXN0ZW0gQ2xp ZW50Cm9wdGlvbnMgCU5GU0xPQ0tECQkjIE5ldHdvcmsgTG9jayBNYW5hZ2VyCiNv cHRpb25zIAlORlNfUk9PVAkJIyBORlMgdXNhYmxlIGFzIC8sIHJlcXVpcmVzIE5G U0NMCiNvcHRpb25zIAlCT09UUAojb3B0aW9ucyAJQk9PVFBfTkZTUk9PVAojb3B0 aW9ucyAJQk9PVFBfTkZTVjMKI29wdGlvbnMgCUJPT1RQX1dJUkVEX1RPPW1nZTAK IyBBZGRlZCBieSBSb25hbGQKb3B0aW9ucyAJU09GVFVQREFURVMKb3B0aW9ucyAJ TVNET1NGUwpvcHRpb25zIAlJUEZJUkVXQUxMCm9wdGlvbnMgCUlQRklSRVdBTExf REVGQVVMVF9UT19BQ0NFUFQKCiMgUm9vdCBmcyBvbiBVU0IgZGV2aWNlCm9wdGlv bnMgCVJPT1RERVZOQU1FPVwidWZzOi9kZXYvZGEwczFhXCIKI29wdGlvbnMgCVJP T1RERVZOQU1FPVwidWZzOi9kZXYvZGEwczJkXCIKI29wdGlvbnMgCVJPT1RERVZO QU1FPVwibmFuZGZzOi9kZXYvZGEwczJkXCIKI29wdGlvbnMgCVJPT1RERVZOQU1F PVwibmFuZGZzOi9kZXYvZ25hbmQwcy5yb290XCIKCm9wdGlvbnMgCVNZU1ZTSE0J CQkjIFNZU1Ytc3R5bGUgc2hhcmVkIG1lbW9yeQpvcHRpb25zIAlTWVNWTVNHCQkJ IyBTWVNWLXN0eWxlIG1lc3NhZ2UgcXVldWVzCm9wdGlvbnMgCVNZU1ZTRU0JCQkj IFNZU1Ytc3R5bGUgc2VtYXBob3JlcwpvcHRpb25zIAlfS1BPU0lYX1BSSU9SSVRZ X1NDSEVEVUxJTkcgIyBQb3NpeCBQMTAwM18xQiByZWFsLXRpbWUgZXh0ZW5zaW9u cwpvcHRpb25zIAlNVVRFWF9OT0lOTElORQpvcHRpb25zIAlSV0xPQ0tfTk9JTkxJ TkUKb3B0aW9ucyAJTk9fRkZTX1NOQVBTSE9UCm9wdGlvbnMgCU5PX1NXQVBQSU5H CgojIERlYnVnZ2luZwpvcHRpb25zIAlBTFRfQlJFQUtfVE9fREVCVUdHRVIKb3B0 aW9ucyAJRERCCm9wdGlvbnMgCUtEQgoKIyBQc2V1ZG8gZGV2aWNlcwpkZXZpY2UJ CXJhbmRvbQpkZXZpY2UJCWxvb3AKZGV2aWNlCQltZAoKIyBTZXJpYWwgcG9ydHMK ZGV2aWNlCQl1YXJ0CgojIE5ldHdvcmtpbmcKZGV2aWNlCQlldGhlcgpkZXZpY2UJ CW1nZQkJCSMgTWFydmVsbCBHaWdhYml0IEV0aGVybmV0IGNvbnRyb2xsZXIKZGV2 aWNlCQltaWkKZGV2aWNlCQllMTAwMHBoeQpkZXZpY2UJCWJwZgpvcHRpb25zIAlI Wj0xMDAwCm9wdGlvbnMgCURFVklDRV9QT0xMSU5HCmRldmljZQkJdmxhbgoKZGV2 aWNlCQljZXNhCQkJIyBNYXJ2ZWxsIHNlY3VyaXR5IGVuZ2luZQpkZXZpY2UJCWNy eXB0bwpkZXZpY2UJCWNyeXB0b2RldgoKIyBVU0IKb3B0aW9ucyAJVVNCX0RFQlVH CQkjIGVuYWJsZSBkZWJ1ZyBtc2dzCmRldmljZQkJdXNiCmRldmljZQkJZWhjaQpk ZXZpY2UJCXVtYXNzCmRldmljZQkJc2NidXMKZGV2aWNlCQlwYXNzCmRldmljZQkJ ZGEKCiMgTkFORApkZXZpY2UJCW5hbmQKCiNkZXZpY2UgCQltbWMKI2RldmljZSAJ CW1tY3NkCiNkZXZpY2UgCQlzZGhjaQoKIyBGbGF0dGVuZWQgRGV2aWNlIFRyZWUK b3B0aW9ucyAJRkRUCm9wdGlvbnMgCUZEVF9EVEJfU1RBVElDCm1ha2VvcHRpb25z CUZEVF9EVFNfRklMRT1zaGVldmFwbHVnLmR0cwo= ------------JxeXu0Csk7rvM8RhMGHkI1 Content-Disposition: attachment; filename=svn.diff.txt Content-Type: text/plain; name=svn.diff.txt Content-Transfer-Encoding: 7bit Index: sys/arm/conf/SHEEVAPLUG =================================================================== --- sys/arm/conf/SHEEVAPLUG (revision 273634) +++ sys/arm/conf/SHEEVAPLUG (working copy) @@ -10,12 +10,20 @@ options SOC_MV_KIRKWOOD makeoptions MODULES_OVERRIDE="" -#makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols +makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols +#options INVARIANTS +#options INVARIANT_SUPPORT +#options WITNESS +#options DEBUG_LOCKS +#options DEBUG_VFS_LOCKS +#options DIAGNOSTIC makeoptions WERROR="-Werror" +option KSTACK_PAGES=5 +option ARM_DEVICE_MULTIPASS options SCHED_4BSD # 4BSD scheduler options INET # InterNETworking -options INET6 # IPv6 communications protocols +#options INET6 # IPv6 communications protocols options GEOM_PART_BSD # BSD partition scheme options GEOM_PART_MBR # MBR partition scheme options TMPFS # Efficient memory filesystem @@ -23,14 +31,22 @@ options NANDFS # NAND Filesystem options NFSCL # New Network Filesystem Client options NFSLOCKD # Network Lock Manager -options NFS_ROOT # NFS usable as /, requires NFSCL -options BOOTP -options BOOTP_NFSROOT -options BOOTP_NFSV3 -options BOOTP_WIRED_TO=mge0 +#options NFS_ROOT # NFS usable as /, requires NFSCL +#options BOOTP +#options BOOTP_NFSROOT +#options BOOTP_NFSV3 +#options BOOTP_WIRED_TO=mge0 +# Added by Ronald +options SOFTUPDATES +options MSDOSFS +options IPFIREWALL +options IPFIREWALL_DEFAULT_TO_ACCEPT # Root fs on USB device -#options ROOTDEVNAME=\"ufs:/dev/da0a\" +options ROOTDEVNAME=\"ufs:/dev/da0s1a\" +#options ROOTDEVNAME=\"ufs:/dev/da0s2d\" +#options ROOTDEVNAME=\"nandfs:/dev/da0s2d\" +#options ROOTDEVNAME=\"nandfs:/dev/gnand0s.root\" options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues @@ -49,6 +65,7 @@ # Pseudo devices device random device loop +device md # Serial ports device uart @@ -79,6 +96,10 @@ # NAND device nand +#device mmc +#device mmcsd +#device sdhci + # Flattened Device Tree options FDT options FDT_DTB_STATIC Index: sys/boot/fdt/dts/arm/sheevaplug.dts =================================================================== --- sys/boot/fdt/dts/arm/sheevaplug.dts (revision 273634) +++ sys/boot/fdt/dts/arm/sheevaplug.dts (working copy) @@ -95,7 +95,12 @@ }; slice@200000 { - reg = <0x200000 0x1fe00000>; + reg = <0x200000 0x600000>; + label = "fbsd-boot"; + }; + + slice@800000 { + reg = <0x800000 0x1f800000>; label = "root"; }; }; ------------JxeXu0Csk7rvM8RhMGHkI1-- From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 03:20:24 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A9478494 for ; Fri, 21 Nov 2014 03:20:24 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 4BD7AA77 for ; Fri, 21 Nov 2014 03:20:23 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArEEAHmublSDaFve/2dsb2JhbABahECDAs0MgxwCgRwBAQEBAX2ECSMEgRYZAgRVBohUrgWPS5ZiAQEBAQYBAQEBAQEBG5BUGSKCd4FUBYwbiF2LHIcJhFOJbIIAIIF5H4F4gQMBAQE X-IronPort-AV: E=Sophos;i="5.07,428,1413259200"; d="scan'208";a="170011465" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 20 Nov 2014 22:19:14 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C4AA7B409E for ; Thu, 20 Nov 2014 22:19:14 -0500 (EST) Date: Thu, 20 Nov 2014 22:19:14 -0500 (EST) From: Rick Macklem To: FreeBSD Filesystems Message-ID: <539201047.4538834.1416539954794.JavaMail.root@uoguelph.ca> In-Reply-To: <683927697.4538805.1416539949195.JavaMail.root@uoguelph.ca> Subject: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_4538832_801680899.1416539954793" X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 03:20:24 -0000 ------=_Part_4538832_801680899.1416539954793 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit The attached patch covers the basics of a way to convert the d_fileno field of "struct dirent" to 64bits. This patch is incomplete and won't even build, but I thought I'd post it in case anyone wanted to take a look and comment on the approach it uses. - renames the old/current one "struct dirent32" - changes d_fileno to 64bits and adds a 64bit d_off field for the offset of the underlying file system - defines a new VOP_READDIR() that will return the new "struct dirent" that is used as the default one for a new getdirentries(2). - the old/current getdirentries(2) uses the old VOP_READDIR32() by default. For the case of a file system that supports both the new and old VOP_READDIR(), they are used by the corresponding new and old getdirentries(2) syscalls. For a file system that only supports one of the VOP_READDIR()s, the "struct dirent32" is copied to "struct dirent" (or vice versa). At this point, all file systems would support the old VOP_READDIR() and I think the new VOP_READDIR() can easily be added for NFS, ZFS. (OpenBSD already has UFS code for essentially a new struct dirent and hopefully that code could be ported easily, too.) Anyhow, any comments on this approach? rick ------=_Part_4538832_801680899.1416539954793 Content-Type: text/x-patch; name=fileno.patch Content-Disposition: attachment; filename=fileno.patch Content-Transfer-Encoding: base64 LS0tIHN5cy9kaXJlbnQuaC5zYXYJMjAxNC0xMC0yMyAxODoxMjo1OS4wMDAwMDAwMDAgLTA0MDAK KysrIHN5cy9kaXJlbnQuaAkyMDE0LTExLTE5IDE5OjEzOjEyLjAwMDAwMDAwMCAtMDUwMApAQCAt MzgsMTYgKzM4LDMxIEBACiAKIC8qCiAgKiBUaGUgZGlyZW50IHN0cnVjdHVyZSBkZWZpbmVzIHRo ZSBmb3JtYXQgb2YgZGlyZWN0b3J5IGVudHJpZXMgcmV0dXJuZWQgYnkKLSAqIHRoZSBnZXRkaXJl bnRyaWVzKDIpIHN5c3RlbSBjYWxsLgorICogdGhlIGdldGRpcmVudHJpZXMoMikgc3lzdGVtIGNh bGwgYW5kIGRpcmVudDMyIGZvciB0aGUgZ2V0ZGlyZW50cmllczMyKDIpCisgKiBzeXN0ZW0gY2Fs bC4KICAqCi0gKiBBIGRpcmVjdG9yeSBlbnRyeSBoYXMgYSBzdHJ1Y3QgZGlyZW50IGF0IHRoZSBm cm9udCBvZiBpdCwgY29udGFpbmluZyBpdHMKKyAqIEEgZGlyZWN0b3J5IGVudHJ5IGhhcyBhIHN0 cnVjdCBkaXJlbnQoMzIpIGF0IHRoZSBmcm9udCBvZiBpdCwgY29udGFpbmluZyBpdHMKICAqIGlu b2RlIG51bWJlciwgdGhlIGxlbmd0aCBvZiB0aGUgZW50cnksIGFuZCB0aGUgbGVuZ3RoIG9mIHRo ZSBuYW1lCi0gKiBjb250YWluZWQgaW4gdGhlIGVudHJ5LiAgVGhlc2UgYXJlIGZvbGxvd2VkIGJ5 IHRoZSBuYW1lIHBhZGRlZCB0byBhIDQKKyAqIGNvbnRhaW5lZCBpbiB0aGUgZW50cnkuICBUaGVz ZSBhcmUgZm9sbG93ZWQgYnkgdGhlIG5hbWUgcGFkZGVkIHRvIGEgOCg0KQogICogYnl0ZSBib3Vu ZGFyeSB3aXRoIG51bGwgYnl0ZXMuICBBbGwgbmFtZXMgYXJlIGd1YXJhbnRlZWQgbnVsbCB0ZXJt aW5hdGVkLgogICogVGhlIG1heGltdW0gbGVuZ3RoIG9mIGEgbmFtZSBpbiBhIGRpcmVjdG9yeSBp cyBNQVhOQU1MRU4uCiAgKi8KIAogc3RydWN0IGRpcmVudCB7CisJX191aW50NjRfdCBkX29mZjsJ CS8qIGRpciBvZmZzZXQgZm9yIG9uLWRpc2sgZGlyZWN0b3J5ICovCisJX191aW50NjRfdCBkX2Zp bGVubzsJCS8qIGZpbGUgbnVtYmVyIG9mIGVudHJ5ICovCisJX191aW50MTZfdCBkX3JlY2xlbjsJ CS8qIGxlbmd0aCBvZiB0aGlzIHJlY29yZCAqLworCV9fdWludDhfdCAgZF90eXBlOyAJCS8qIGZp bGUgdHlwZSwgc2VlIGJlbG93ICovCisJX191aW50OF90ICBkX25hbWxlbjsJCS8qIGxlbmd0aCBv ZiBzdHJpbmcgaW4gZF9uYW1lICovCisjaWYgX19CU0RfVklTSUJMRQorI2RlZmluZQlNQVhOQU1M RU4JMjU1CisJY2hhcglkX25hbWVbTUFYTkFNTEVOICsgMV07CS8qIG5hbWUgbXVzdCBiZSBubyBs b25nZXIgdGhhbiB0aGlzICovCisjZWxzZQorCWNoYXIJZF9uYW1lWzI1NSArIDFdOwkvKiBuYW1l IG11c3QgYmUgbm8gbG9uZ2VyIHRoYW4gdGhpcyAqLworI2VuZGlmCit9OworCitzdHJ1Y3QgZGly ZW50MzIgewogCV9fdWludDMyX3QgZF9maWxlbm87CQkvKiBmaWxlIG51bWJlciBvZiBlbnRyeSAq LwogCV9fdWludDE2X3QgZF9yZWNsZW47CQkvKiBsZW5ndGggb2YgdGhpcyByZWNvcmQgKi8KIAlf X3VpbnQ4X3QgIGRfdHlwZTsgCQkvKiBmaWxlIHR5cGUsIHNlZSBiZWxvdyAqLwpAQCAtODEsMjAg Kzk2LDI2IEBAIHN0cnVjdCBkaXJlbnQgewogI2RlZmluZQlEVFRPSUYoZGlydHlwZSkJKChkaXJ0 eXBlKSA8PCAxMikKIAogLyoKLSAqIFRoZSBfR0VORVJJQ19ESVJTSVogbWFjcm8gZ2l2ZXMgdGhl IG1pbmltdW0gcmVjb3JkIGxlbmd0aCB3aGljaCB3aWxsIGhvbGQKLSAqIHRoZSBkaXJlY3Rvcnkg ZW50cnkuICBUaGlzIHJldHVybnMgdGhlIGFtb3VudCBvZiBzcGFjZSBpbiBzdHJ1Y3QgZGlyZWN0 Ci0gKiB3aXRob3V0IHRoZSBkX25hbWUgZmllbGQsIHBsdXMgZW5vdWdoIHNwYWNlIGZvciB0aGUg bmFtZSB3aXRoIGEgdGVybWluYXRpbmcKLSAqIG51bGwgYnl0ZSAoZHAtPmRfbmFtbGVuKzEpLCBy b3VuZGVkIHVwIHRvIGEgNCBieXRlIGJvdW5kYXJ5LgorICogVGhlIF9HRU5FUklDX3h4eCBtYWNy b3MgZ2l2ZXMgdGhlIG1pbmltdW0gcmVjb3JkIGxlbmd0aCB3aGljaCB3aWxsCisgKiBob2xkIHRo ZSBkaXJlY3RvcnkgZW50cnkuICBUaGV5IHJldHVybiB0aGUgYW1vdW50IG9mIHNwYWNlIGluIHN0 cnVjdAorICogZGlyZW50KDMyKSB3aXRob3V0IHRoZSBkX25hbWUgZmllbGQsIHBsdXMgZW5vdWdo IHNwYWNlIGZvciB0aGUgbmFtZSB3aXRoIGEKKyAqIHRlcm1pbmF0aW5nIG51bGwgYnl0ZSAoZHAt PmRfbmFtbGVuKzEpLCByb3VuZGVkIHVwIHRvIGEgOCg0KSBieXRlIGJvdW5kYXJ5LgorICogVGhl IF9HRU5FUklDX0RJUlZBTCgpIGNhc2UgdGFrZXMgdGhlIG5hbWUgbGVuZ3RoIGluc3RlYWQgb2Yg ZHAgYXMgdGhlCisgKiBhcmd1bWVudC4KICAqCiAgKiBYWFggYWx0aG91Z2ggdGhpcyBtYWNybyBp cyBpbiB0aGUgaW1wbGVtZW50YXRpb24gbmFtZXNwYWNlLCBpdCByZXF1aXJlcwogICogYSBtYW5p ZmVzdCBjb25zdGFudCB0aGF0IGlzIG5vdC4KICAqLwotI2RlZmluZQlfR0VORVJJQ19ESVJTSVoo ZHApIFwKLSAgICAoKHNpemVvZiAoc3RydWN0IGRpcmVudCkgLSAoTUFYTkFNTEVOKzEpKSArICgo KGRwKS0+ZF9uYW1sZW4rMSArIDMpICZ+IDMpKQorI2RlZmluZQlfR0VORVJJQ19ESVJWQUwobmFt bGVuKSBcCisgICAgKChzaXplb2Yoc3RydWN0IGRpcmVudCkgLSAoTUFYTkFNTEVOICsgMSkgKyAo bmFtbGVuKSArIDEgKyA3KSAmIH43KQorI2RlZmluZQlfR0VORVJJQ19ESVJTSVooZHApCV9HRU5F UklDX0RJUlZBTCgoZHApLT5kX25hbWxlbikKKyNkZWZpbmUJX0dFTkVSSUNfRElSU0laMzIoZHAp IFwKKyAgICAoKHNpemVvZiAoc3RydWN0IGRpcmVudDMyKSAtIChNQVhOQU1MRU4rMSkpICsgKCgo ZHApLT5kX25hbWxlbisxICsgMykgJn4gMykpCiAjZW5kaWYgLyogX19CU0RfVklTSUJMRSAqLwog CiAjaWZkZWYgX0tFUk5FTAogI2RlZmluZQlHRU5FUklDX0RJUlNJWihkcCkJX0dFTkVSSUNfRElS U0laKGRwKQorI2RlZmluZQlHRU5FUklDX0RJUlNJWjMyKGRwKQlfR0VORVJJQ19ESVJTSVozMihk cCkKICNlbmRpZgogCiAjZW5kaWYgLyogIV9TWVNfRElSRU5UX0hfICovCi0tLSBrZXJuL3Zmc19z eXNjYWxscy5jLnNhdgkyMDE0LTEwLTI0IDE2OjQ1OjM5LjAwMDAwMDAwMCAtMDQwMAorKysga2Vy bi92ZnNfc3lzY2FsbHMuYwkyMDE0LTExLTIwIDIxOjQ2OjI5LjAwMDAwMDAwMCAtMDUwMApAQCAt NDAwNiwxMCArNDAwNiwxMSBAQCB1bmlvbnJlYWQ6CiAjZW5kaWYgLyogQ09NUEFUXzQzICovCiAK IC8qCi0gKiBSZWFkIGEgYmxvY2sgb2YgZGlyZWN0b3J5IGVudHJpZXMgaW4gYSBmaWxlc3lzdGVt IGluZGVwZW5kZW50IGZvcm1hdC4KKyAqIFJlYWQgdGhlIG9sZCAic3RydWN0IGRpcmVudDMyIiBi bG9jayBvZiBkaXJlY3RvcnkgZW50cmllcyBpbiBhCisgKiBmaWxlc3lzdGVtIGluZGVwZW5kZW50 IGZvcm1hdC4KICAqLwogI2lmbmRlZiBfU1lTX1NZU1BST1RPX0hfCi1zdHJ1Y3QgZ2V0ZGlyZW50 cmllc19hcmdzIHsKK3N0cnVjdCBnZXRkaXJlbnRyaWVzMzJfYXJncyB7CiAJaW50CWZkOwogCWNo YXIJKmJ1ZjsKIAl1X2ludAljb3VudDsKQEAgLTQwMTcsOSArNDAxOCw5IEBAIHN0cnVjdCBnZXRk aXJlbnRyaWVzX2FyZ3MgewogfTsKICNlbmRpZgogaW50Ci1zeXNfZ2V0ZGlyZW50cmllcyh0ZCwg dWFwKQorc3lzX2dldGRpcmVudHJpZXMzMih0ZCwgdWFwKQogCXN0cnVjdCB0aHJlYWQgKnRkOwot CXJlZ2lzdGVyIHN0cnVjdCBnZXRkaXJlbnRyaWVzX2FyZ3MgLyogeworCXJlZ2lzdGVyIHN0cnVj dCBnZXRkaXJlbnRyaWVzMzJfYXJncyAvKiB7CiAJCWludCBmZDsKIAkJY2hhciAqYnVmOwogCQl1 X2ludCBjb3VudDsKQEAgLTQwMjksNyArNDAzMCw3IEBAIHN5c19nZXRkaXJlbnRyaWVzKHRkLCB1 YXApCiAJbG9uZyBiYXNlOwogCWludCBlcnJvcjsKIAotCWVycm9yID0ga2Vybl9nZXRkaXJlbnRy aWVzKHRkLCB1YXAtPmZkLCB1YXAtPmJ1ZiwgdWFwLT5jb3VudCwgJmJhc2UsCisJZXJyb3IgPSBr ZXJuX2dldGRpcmVudHJpZXMzMih0ZCwgdWFwLT5mZCwgdWFwLT5idWYsIHVhcC0+Y291bnQsICZi YXNlLAogCSAgICBOVUxMLCBVSU9fVVNFUlNQQUNFKTsKIAlpZiAoZXJyb3IgIT0gMCkKIAkJcmV0 dXJuIChlcnJvcik7CkBAIC00MDM5LDcgKzQwNDAsNyBAQCBzeXNfZ2V0ZGlyZW50cmllcyh0ZCwg dWFwKQogfQogCiBpbnQKLWtlcm5fZ2V0ZGlyZW50cmllcyhzdHJ1Y3QgdGhyZWFkICp0ZCwgaW50 IGZkLCBjaGFyICpidWYsIHVfaW50IGNvdW50LAora2Vybl9nZXRkaXJlbnRyaWVzMzIoc3RydWN0 IHRocmVhZCAqdGQsIGludCBmZCwgY2hhciAqYnVmLCB1X2ludCBjb3VudCwKICAgICBsb25nICpi YXNlcCwgc3NpemVfdCAqcmVzaWRwLCBlbnVtIHVpb19zZWcgYnVmc2VnKQogewogCXN0cnVjdCB2 bm9kZSAqdnA7CkBAIC00MDQ4LDggKzQwNDksOSBAQCBrZXJuX2dldGRpcmVudHJpZXMoc3RydWN0 IHRocmVhZCAqdGQsIGluCiAJc3RydWN0IGlvdmVjIGFpb3Y7CiAJY2FwX3JpZ2h0c190IHJpZ2h0 czsKIAlsb25nIGxvZmY7Ci0JaW50IGVycm9yLCBlb2ZmbGFnOworCWludCBjb3B5X2RpciA9IDAs IGVycm9yLCBlb2ZmbGFnOwogCW9mZl90IGZvZmZzZXQ7CisJY2hhciAqdGJ1ZiA9IE5VTEw7CiAK IAlBVURJVF9BUkdfRkQoZmQpOwogCWlmIChjb3VudCA+IElPU0laRV9NQVgpCkBAIC00MDcwLDIy ICs0MDcyLDQ2IEBAIHVuaW9ucmVhZDoKIAkJZXJyb3IgPSBFSU5WQUw7CiAJCWdvdG8gZmFpbDsK IAl9Ci0JYWlvdi5pb3ZfYmFzZSA9IGJ1ZjsKKwl2bl9sb2NrKHZwLCBMS19TSEFSRUQgfCBMS19S RVRSWSk7Cit0cnluZXc6CisJLyoKKwkgKiBJZiB0aGlzIGZpbGUgc3lzdGVtIG9ubHkgcmV0dXJu cyB0aGUgbmV3IHN0cnVjdCBkaXJlbnQsIGFsbG9jYXRlCisJICogYSBrZXJuZWwgYnVmZmVyIHRv IGJlIHJlYWQgaW50bywgc28gaXQgY2FuIGJlIGNvcGllZC9jb252ZXJ0ZWQuCisJICovCisJaWYg KGNvcHlfZGlyICE9IDAgJiYgYnVmc2VnID09IFVJT19VU0VSU1BBQ0UpIHsKKwkJaWYgKHRidWYg PT0gTlVMTCkKKwkJCXRidWYgPSBtYWxsb2MoY291bnQsIE1fVEVNUCwgTV9XQUlUT0spOworCQlh aW92Lmlvdl9iYXNlID0gdGJ1ZjsKKwl9IGVsc2UKKwkJYWlvdi5pb3ZfYmFzZSA9IGJ1ZjsKIAlh aW92Lmlvdl9sZW4gPSBjb3VudDsKIAlhdWlvLnVpb19pb3YgPSAmYWlvdjsKIAlhdWlvLnVpb19p b3ZjbnQgPSAxOwogCWF1aW8udWlvX3J3ID0gVUlPX1JFQUQ7Ci0JYXVpby51aW9fc2VnZmxnID0g YnVmc2VnOworCWlmIChjb3B5X2RpciAhPSAwICYmIGJ1ZnNlZyA9PSBVSU9fVVNFUlNQQUNFKQor CQlhdWlvLnVpb19zZWdmbGcgPSBVSU9fU1lTU1BBQ0U7CisJZWxzZQorCQlhdWlvLnVpb19zZWdm bGcgPSBidWZzZWc7CiAJYXVpby51aW9fdGQgPSB0ZDsKLQl2bl9sb2NrKHZwLCBMS19TSEFSRUQg fCBMS19SRVRSWSk7CiAJQVVESVRfQVJHX1ZOT0RFMSh2cCk7CiAJbG9mZiA9IGF1aW8udWlvX29m ZnNldCA9IGZvZmZzZXQ7CiAjaWZkZWYgTUFDCiAJZXJyb3IgPSBtYWNfdm5vZGVfY2hlY2tfcmVh ZGRpcih0ZC0+dGRfdWNyZWQsIHZwKTsKIAlpZiAoZXJyb3IgPT0gMCkKICNlbmRpZgotCQllcnJv ciA9IFZPUF9SRUFERElSKHZwLCAmYXVpbywgZnAtPmZfY3JlZCwgJmVvZmZsYWcsIE5VTEwsCi0J CSAgICBOVUxMKTsKKwl7CisJCWlmIChjb3B5X2RpciA9PSAwKSB7CisJCQllcnJvciA9IFZPUF9S RUFERElSMzIodnAsICZhdWlvLCBmcC0+Zl9jcmVkLCAmZW9mZmxhZywKKwkJCSAgICBOVUxMLCBO VUxMKTsKKwkJCWlmIChlcnJvciA9PSBFT1BOT1RTVVBQKSB7CisJCQkJY29weV9kaXIgPSAxOwor CQkJCWVycm9yID0gMDsKKwkJCQlnb3RvIHRyeW5ldzsKKwkJCX0KKwkJfSBlbHNlCisJCQllcnJv ciA9IFZPUF9SRUFERElSKHZwLCAmYXVpbywgZnAtPmZfY3JlZCwgJmVvZmZsYWcsCisJCQkgICAg TlVMTCwgTlVMTCk7CisJfQogCWZvZmZzZXQgPSBhdWlvLnVpb19vZmZzZXQ7CiAJaWYgKGVycm9y ICE9IDApIHsKIAkJVk9QX1VOTE9DSyh2cCwgMCk7CkBAIC00MTAyLDE0ICs0MTI4LDIwOSBAQCB1 bmlvbnJlYWQ6CiAJCWZwLT5mX2RhdGEgPSB2cDsKIAkJZm9mZnNldCA9IDA7CiAJCXZwdXQodHZw KTsKKwkJY29weV9kaXIgPSAwOwogCQlnb3RvIHVuaW9ucmVhZDsKIAl9CiAJVk9QX1VOTE9DSyh2 cCwgMCk7CisJaWYgKGNvcHlfZGlyICE9IDAgJiYgY291bnQgLSBhdWlvLnVpb19yZXNpZCA+IDAp IHsKKwkJaWYgKGJ1ZnNlZyA9PSBVSU9fVVNFUlNQQUNFKSB7CisJCQljb3B5X2RpcmVudDMyKHRi dWYsIGNvdW50IC0gYXVpby51aW9fcmVzaWQpOworCQkJZXJyb3IgPSBjb3B5b3V0KHRidWYsIGJ1 ZiwgY291bnQgLSBhdWlvLnVpb19yZXNpZCk7CisJCQlpZiAoZXJyb3IgIT0gMCkKKwkJCQlnb3Rv IGZhaWw7CisJCX0gZWxzZQorCQkJY29weV9kaXJlbnQzMihidWYsIGNvdW50IC0gYXVpby51aW9f cmVzaWQpOworCX0KIAkqYmFzZXAgPSBsb2ZmOwogCWlmIChyZXNpZHAgIT0gTlVMTCkKIAkJKnJl c2lkcCA9IGF1aW8udWlvX3Jlc2lkOwogCXRkLT50ZF9yZXR2YWxbMF0gPSBjb3VudCAtIGF1aW8u dWlvX3Jlc2lkOwogZmFpbDoKKwlpZiAodGJ1ZiAhPSBOVUxMKQorCQlmcmVlKHRidWYsIE1fVEVN UCk7CisJZm9mZnNldF91bmxvY2soZnAsIGZvZmZzZXQsIDApOworCWZkcm9wKGZwLCB0ZCk7CisJ cmV0dXJuIChlcnJvcik7Cit9CisKKyNpZm5kZWYgX1NZU19TWVNQUk9UT19IXworc3RydWN0IGdl dGRlbnRzMzJfYXJncyB7CisJaW50IGZkOworCWNoYXIgKmJ1ZjsKKwlzaXplX3QgY291bnQ7Cit9 OworI2VuZGlmCitpbnQKK3N5c19nZXRkZW50czMyKHRkLCB1YXApCisJc3RydWN0IHRocmVhZCAq dGQ7CisJcmVnaXN0ZXIgc3RydWN0IGdldGRlbnRzMzJfYXJncyAvKiB7CisJCWludCBmZDsKKwkJ Y2hhciAqYnVmOworCQl1X2ludCBjb3VudDsKKwl9ICovICp1YXA7Cit7CisJc3RydWN0IGdldGRp cmVudHJpZXMzMl9hcmdzIGFwOworCisJYXAuZmQgPSB1YXAtPmZkOworCWFwLmJ1ZiA9IHVhcC0+ YnVmOworCWFwLmNvdW50ID0gdWFwLT5jb3VudDsKKwlhcC5iYXNlcCA9IE5VTEw7CisJcmV0dXJu IChzeXNfZ2V0ZGlyZW50cmllczMyKHRkLCAmYXApKTsKK30KKworLyoKKyAqIFJlYWQgaW4gdGhl IG5ldyAic3RydWN0IGRpcmVudCIgYmxvY2sgb2YgZGlyZWN0b3J5IGVudHJpZXMgaW4gYQorICog ZmlsZXN5c3RlbSBpbmRlcGVuZGVudCBmb3JtYXQuCisgKi8KKyNpZm5kZWYgX1NZU19TWVNQUk9U T19IXworc3RydWN0IGdldGRpcmVudHJpZXNfYXJncyB7CisJaW50CWZkOworCWNoYXIJKmJ1ZjsK Kwl1X2ludAljb3VudDsKKwl1aW50NjRfdCAqYmFzZXA7Cit9OworI2VuZGlmCitpbnQKK3N5c19n ZXRkaXJlbnRyaWVzKHRkLCB1YXApCisJc3RydWN0IHRocmVhZCAqdGQ7CisJcmVnaXN0ZXIgc3Ry dWN0IGdldGRpcmVudHJpZXNfYXJncyAvKiB7CisJCWludCBmZDsKKwkJY2hhciAqYnVmOworCQl1 X2ludCBjb3VudDsKKwkJdWludDY0X3QgKmJhc2VwOworCX0gKi8gKnVhcDsKK3sKKwl1aW50NjRf dCBiYXNlOworCWludCBlcnJvcjsKKworCWVycm9yID0ga2Vybl9nZXRkaXJlbnRyaWVzKHRkLCB1 YXAtPmZkLCB1YXAtPmJ1ZiwgdWFwLT5jb3VudCwgJmJhc2UsCisJICAgIE5VTEwsIFVJT19VU0VS U1BBQ0UpOworCWlmIChlcnJvciAhPSAwKQorCQlyZXR1cm4gKGVycm9yKTsKKwlpZiAodWFwLT5i YXNlcCAhPSBOVUxMKQorCQllcnJvciA9IGNvcHlvdXQoJmJhc2UsIHVhcC0+YmFzZXAsIHNpemVv Zih1aW50NjRfdCkpOworCXJldHVybiAoZXJyb3IpOworfQorCitpbnQKK2tlcm5fZ2V0ZGlyZW50 cmllcyhzdHJ1Y3QgdGhyZWFkICp0ZCwgaW50IGZkLCBjaGFyICpidWYsIHVfaW50IGNvdW50LAor ICAgIHVpbnQ2NF90ICpiYXNlcCwgc3NpemVfdCAqcmVzaWRwLCBlbnVtIHVpb19zZWcgYnVmc2Vn KQoreworCXN0cnVjdCB2bm9kZSAqdnA7CisJc3RydWN0IGZpbGUgKmZwOworCXN0cnVjdCB1aW8g YXVpbzsKKwlzdHJ1Y3QgaW92ZWMgYWlvdjsKKwljYXBfcmlnaHRzX3QgcmlnaHRzOworCXVpbnQ2 NF90IGxvZmY7CisJaW50IGNvcHlfZGlyID0gMCwgZXJyb3IsIGVvZmZsYWc7CisJb2ZmX3QgZm9m ZnNldDsKKwljaGFyICppYnVmID0gTlVMTCwgKm9idWYgPSBOVUxMOworCXVfaW50IG9idWZsZW47 CisKKwlBVURJVF9BUkdfRkQoZmQpOworCWlmIChjb3VudCA+IElPU0laRV9NQVgpCisJCXJldHVy biAoRUlOVkFMKTsKKwlhdWlvLnVpb19yZXNpZCA9IGNvdW50OworCWVycm9yID0gZ2V0dm5vZGUo dGQtPnRkX3Byb2MtPnBfZmQsIGZkLAorCSAgICBjYXBfcmlnaHRzX2luaXQoJnJpZ2h0cywgQ0FQ X1JFQUQpLCAmZnApOworCWlmIChlcnJvciAhPSAwKQorCQlyZXR1cm4gKGVycm9yKTsKKwlpZiAo KGZwLT5mX2ZsYWcgJiBGUkVBRCkgPT0gMCkgeworCQlmZHJvcChmcCwgdGQpOworCQlyZXR1cm4g KEVCQURGKTsKKwl9CisJdnAgPSBmcC0+Zl92bm9kZTsKKwlmb2Zmc2V0ID0gZm9mZnNldF9sb2Nr KGZwLCAwKTsKK3VuaW9ucmVhZDoKKwlpZiAodnAtPnZfdHlwZSAhPSBWRElSKSB7CisJCWVycm9y ID0gRUlOVkFMOworCQlnb3RvIGZhaWw7CisJfQorCXZuX2xvY2sodnAsIExLX1NIQVJFRCB8IExL X1JFVFJZKTsKK3RyeW9sZDoKKwkvKgorCSAqIElmIHRoaXMgZmlsZSBzeXN0ZW0gb25seSByZXR1 cm5zIHRoZSBvbGQgc3RydWN0IGRpcmVudCwgYWxsb2NhdGUKKwkgKiBrZXJuZWwgYnVmZmVycyB0 byBiZSByZWFkIGFuZCBjb3BpZWQvY29udmVydGVkIGludG8uCisJICovCisJaWYgKGNvcHlfZGly ICE9IDApIHsKKwkJaWYgKGlidWYgPT0gTlVMTCkKKwkJCWlidWYgPSBtYWxsb2MoY291bnQsIE1f VEVNUCwgTV9XQUlUT0spOworCQlpZiAob2J1ZiA9PSBOVUxMKQorCQkJb2J1ZiA9IG1hbGxvYyhj b3VudCwgTV9URU1QLCBNX1dBSVRPSyk7CisJCWFpb3YuaW92X2Jhc2UgPSBpYnVmOworCX0gZWxz ZQorCQlhaW92Lmlvdl9iYXNlID0gYnVmOworCWFpb3YuaW92X2xlbiA9IGNvdW50OworCWF1aW8u dWlvX2lvdiA9ICZhaW92OworCWF1aW8udWlvX2lvdmNudCA9IDE7CisJYXVpby51aW9fcncgPSBV SU9fUkVBRDsKKwlpZiAoY29weV9kaXIgIT0gMCkKKwkJYXVpby51aW9fc2VnZmxnID0gVUlPX1NZ U1NQQUNFOworCWVsc2UKKwkJYXVpby51aW9fc2VnZmxnID0gYnVmc2VnOworCWF1aW8udWlvX3Rk ID0gdGQ7CisJQVVESVRfQVJHX1ZOT0RFMSh2cCk7CisJbG9mZiA9IGF1aW8udWlvX29mZnNldCA9 IGZvZmZzZXQ7CisjaWZkZWYgTUFDCisJZXJyb3IgPSBtYWNfdm5vZGVfY2hlY2tfcmVhZGRpcih0 ZC0+dGRfdWNyZWQsIHZwKTsKKwlpZiAoZXJyb3IgPT0gMCkKKyNlbmRpZgorCXsKKwkJaWYgKGNv cHlfZGlyID09IDApIHsKKwkJCWVycm9yID0gVk9QX1JFQURESVIodnAsICZhdWlvLCBmcC0+Zl9j cmVkLCAmZW9mZmxhZywKKwkJCSAgICBOVUxMLCBOVUxMKTsKKwkJCWlmIChlcnJvciA9PSBFT1BO T1RTVVBQKSB7CisJCQkJY29weV9kaXIgPSAxOworCQkJCWVycm9yID0gMDsKKwkJCQlnb3RvIHRy eW9sZDsKKwkJCX0KKwkJCWZvZmZzZXQgPSBhdWlvLnVpb19vZmZzZXQ7CisJCX0gZWxzZQorCQkJ ZXJyb3IgPSBWT1BfUkVBRERJUjMyKHZwLCAmYXVpbywgZnAtPmZfY3JlZCwgJmVvZmZsYWcsCisJ CQkgICAgTlVMTCwgTlVMTCk7CisJfQorCWlmIChlcnJvciAhPSAwKSB7CisJCVZPUF9VTkxPQ0so dnAsIDApOworCQlnb3RvIGZhaWw7CisJfQorCWlmIChjb3VudCA9PSBhdWlvLnVpb19yZXNpZCAm JgorCSAgICAodnAtPnZfdmZsYWcgJiBWVl9ST09UKSAmJgorCSAgICAodnAtPnZfbW91bnQtPm1u dF9mbGFnICYgTU5UX1VOSU9OKSkgeworCQlzdHJ1Y3Qgdm5vZGUgKnR2cCA9IHZwOworCisJCXZw ID0gdnAtPnZfbW91bnQtPm1udF92bm9kZWNvdmVyZWQ7CisJCVZSRUYodnApOworCQlmcC0+Zl92 bm9kZSA9IHZwOworCQlmcC0+Zl9kYXRhID0gdnA7CisJCWZvZmZzZXQgPSAwOworCQl2cHV0KHR2 cCk7CisJCWNvcHlfZGlyID0gMDsKKwkJZ290byB1bmlvbnJlYWQ7CisJfQorCVZPUF9VTkxPQ0so dnAsIDApOworCWlmIChjb3B5X2RpciAhPSAwICYmIGNvdW50IC0gYXVpby51aW9fcmVzaWQgPiAw KSB7CisJCW9idWZsZW4gPSBjb3B5X2RpcmVudChpYnVmLCBjb3VudCAtIGF1aW8udWlvX3Jlc2lk LCBvYnVmLCBjb3VudCwKKwkJICAgICZmb2Zmc2V0KTsKKwkJaWYgKGJ1ZnNlZyA9PSBVSU9fVVNF UlNQQUNFKQorCQkJZXJyb3IgPSBjb3B5b3V0KG9idWYsIGJ1Ziwgb2J1Zmxlbik7CisJCWVsc2UK KwkJCWJjb3B5KG9idWYsIGJ1Ziwgb2J1Zmxlbik7CisJCWlmIChlcnJvciAhPSAwKQorCQkJZ290 byBmYWlsOworCQlpZiAocmVzaWRwICE9IE5VTEwpCisJCQkqcmVzaWRwID0gY291bnQgLSBvYnVm bGVuOworCQl0ZC0+dGRfcmV0dmFsWzBdID0gb2J1ZmxlbjsKKwl9IGVsc2UgeworCQlpZiAocmVz aWRwICE9IE5VTEwpCisJCQkqcmVzaWRwID0gYXVpby51aW9fcmVzaWQ7CisJCXRkLT50ZF9yZXR2 YWxbMF0gPSBjb3VudCAtIGF1aW8udWlvX3Jlc2lkOworCX0KKwkqYmFzZXAgPSBsb2ZmOworZmFp bDoKKwlpZiAoaWJ1ZiAhPSBOVUxMKQorCQlmcmVlKGlidWYsIE1fVEVNUCk7CisJaWYgKG9idWYg IT0gTlVMTCkKKwkJZnJlZShvYnVmLCBNX1RFTVApOwogCWZvZmZzZXRfdW5sb2NrKGZwLCBmb2Zm c2V0LCAwKTsKIAlmZHJvcChmcCwgdGQpOwogCXJldHVybiAoZXJyb3IpOwpAQCAtNDE0MSw2ICs0 MzYyLDY5IEBAIHN5c19nZXRkZW50cyh0ZCwgdWFwKQogfQogCiAvKgorICogQ29weSB0aGUgbmV3 IHN0cnVjdCBuZGlyZW50IHRvIHRoZSBvbGQgc3RydWN0IGRpcmVudCBmb3JtYXQuCisgKi8KK3N0 YXRpYyB2b2lkCitjb3B5X2RpcmVudDMyKGNoYXIgKmJ1ZiwgdV9pbnQgbGVuKQoreworCXN0cnVj dCBkaXJlbnQgKmRwOworCXN0cnVjdCBkaXJlbnQzMiAqZHAzMjsKKworCXdoaWxlIChsZW4gPiAw KSB7CisJCWRwID0gKHN0cnVjdCBkaXJlbnQgKilidWY7CisJCWRwMzIgPSAoc3RydWN0IGRpcmVu dDMyICopYnVmOworCQlkcDMyLT5kX2ZpbGVubyA9IGRwLT5kX2ZpbGVubzsKKwkJZHAzMi0+ZF9y ZWNsZW4gPSBkcC0+ZF9yZWNsZW47CisJCWRwMzItPmRfdHlwZSA9IGRwLT5kX3R5cGU7CisJCWRw MzItPmRfbmFtbGVuID0gZHAtPmRfbmFtbGVuOworCQliY29weShkcC0+ZF9uYW1lLCBkcDMyLT5k X25hbWUsIGRwMzItPmRfbmFtbGVuICsgMSk7CisJCWJ1ZiArPSBkcDMyLT5kX3JlY2xlbjsKKwkJ bGVuIC09IGRwMzItPmRfcmVjbGVuOworCX0KK30KKworLyoKKyAqIENvcHkgdGhlIG9sZCBzdHJ1 Y3QgZGlyZW50MzIgdG8gbmV3IHN0cnVjdCBkaXJlbnQgZm9ybWF0LgorICovCitzdGF0aWMgdV9p bnQKK2NvcHlfZGlyZW50KGNoYXIgKmlidWYsIHVfaW50IGlsZW4sIGNoYXIgKm9idWYsIHVfaW50 IG9sZW4sIG9mZl90ICpvZmZwKQoreworCXN0cnVjdCBkaXJlbnQgKmRwOworCXN0cnVjdCBkaXJl bnQzMiAqZHAzMjsKKwl1X2ludCBsZWZ0LCBvY250OworCisJZHAzMiA9IChzdHJ1Y3QgZGlyZW50 MzIgKilpYnVmOworCW9jbnQgPSAwOworCXdoaWxlIChpbGVuID4gMCAmJiBvbGVuID49IG9jbnQg KyBfR0VORVJJQ19ESVJWQUwoZHAzMi0+ZF9uYW1sZW4pKSB7CisJCWRwID0gKHN0cnVjdCBkaXJl bnQgKilvYnVmOworCQlkcC0+ZF9vZmYgPSAqb2ZmcDsKKwkJZHAtPmRfZmlsZW5vID0gZHAzMi0+ ZF9maWxlbm87CisJCWRwLT5kX3R5cGUgPSBkcDMyLT5kX3R5cGU7CisJCWRwLT5kX25hbWxlbiA9 IGRwMzItPmRfbmFtbGVuOworCQliY29weShkcDMyLT5kX25hbWUsIGRwLT5kX25hbWUsIGRwMzIt PmRfbmFtbGVuICsgMSk7CisJCWRwLT5kX3JlY2xlbiA9IF9HRU5FUklDX0RJUlNJWihkcCk7CisJ CWlidWYgKz0gZHAzMi0+ZF9yZWNsZW47CisJCWlsZW4gLT0gZHAzMi0+ZF9yZWNsZW47CisJCSpv ZmZwICs9IGRwMzItPmRfcmVjbGVuOworCQlvYnVmICs9IGRwLT5kX3JlY2xlbjsKKwkJb2NudCAr PSBkcC0+ZF9yZWNsZW47CisJCWxlZnQgPSBERVZfQlNJWkUgLSAob2NudCAmIChERVZfQlNJWkUg LSAxKSk7CisJCWRwMzIgPSAoc3RydWN0IGRpcmVudDMyICopaWJ1ZjsKKwkJaWYgKGlsZW4gPiAw ICYmIGxlZnQgPCBfR0VORVJJQ19ESVJWQUwoZHAzMi0+ZF9uYW1sZW4pKSB7CisJCQlkcC0+ZF9y ZWNsZW4gKz0gbGVmdDsKKwkJCW9idWYgKz0gbGVmdDsKKwkJCW9jbnQgKz0gbGVmdDsKKwkJfQor CX0KKwlpZiAob2NudCA8IG9sZW4pIHsKKwkJbGVmdCA9IERFVl9CU0laRSAtIChvY250ICYgKERF Vl9CU0laRSAtIDEpKTsKKwkJZHAtPmRfcmVjbGVuICs9IGxlZnQ7CisJCW9jbnQgKz0gbGVmdDsK Kwl9CisJcmV0dXJuIChvY250KTsKK30KKworLyoKICAqIFNldCB0aGUgbW9kZSBtYXNrIGZvciBj cmVhdGlvbiBvZiBmaWxlc3lzdGVtIG5vZGVzLgogICovCiAjaWZuZGVmIF9TWVNfU1lTUFJPVE9f SF8K ------=_Part_4538832_801680899.1416539954793-- From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 15:58:00 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A7BBBC5A for ; Fri, 21 Nov 2014 15:58:00 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4A41920B for ; Fri, 21 Nov 2014 15:58:00 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sALFvtji007331 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 21 Nov 2014 17:57:55 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sALFvtji007331 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sALFvteg007330; Fri, 21 Nov 2014 17:57:55 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 21 Nov 2014 17:57:54 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141121155754.GN17068@kib.kiev.ua> References: <683927697.4538805.1416539949195.JavaMail.root@uoguelph.ca> <539201047.4538834.1416539954794.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <539201047.4538834.1416539954794.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 15:58:00 -0000 On Thu, Nov 20, 2014 at 10:19:14PM -0500, Rick Macklem wrote: > The attached patch covers the basics of a way to > convert the d_fileno field of "struct dirent" to > 64bits. This patch is incomplete and won't even > build, but I thought I'd post it in case anyone > wanted to take a look and comment on the approach > it uses. > > - renames the old/current one "struct dirent32" > - changes d_fileno to 64bits and adds a 64bit > d_off field for the offset of the underlying > file system > - defines a new VOP_READDIR() that will return > the new "struct dirent" that is used as the > default one for a new getdirentries(2). > - the old/current getdirentries(2) uses the old > VOP_READDIR32() by default. > > For the case of a file system that supports both > the new and old VOP_READDIR(), they are used by > the corresponding new and old getdirentries(2) > syscalls. > > For a file system that only supports one of > the VOP_READDIR()s, the "struct dirent32" > is copied to "struct dirent" (or vice versa). > > At this point, all file systems would support > the old VOP_READDIR() and I think the new > VOP_READDIR() can easily be added for NFS, > ZFS. (OpenBSD already has UFS code for > essentially a new struct dirent and hopefully > that code could be ported easily, too.) > > Anyhow, any comments on this approach? rick I do not think we need to have in-kernel compatibility shims. The work, big but relatively trivial, is to convert filesystems to use the new ino_t, even if the on-disk structures still use 32bit inode number. Really problematic part of this change is the usermode ABI breakage. The struct dirent is only the start of the whole issue. ino_t is embedded into more structures which are part of the contract, e.g. struct stat. We have to provide new syscalls which accept or return the affected structures. And then, there are libraries which embed ino_t into their ABI. Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even after the base system is fixed by properly providing the compat shims and symbol versions for the affected libraries, we get the same problem with the binaries not from base. Summary of the issue with ino_t is that it is not too hard to fix the kernel, comparing with the ABI issues which must be solved in usermode. From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 16:14:46 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D9DCB3CE for ; Fri, 21 Nov 2014 16:14:46 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BB05E61B for ; Fri, 21 Nov 2014 16:14:46 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id sALGEjRb082219; Fri, 21 Nov 2014 08:14:45 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201411211614.sALGEjRb082219@chez.mckusick.com> To: Konstantin Belousov Subject: Re: RFC: patch to make d_fileno 64bits In-reply-to: <20141121155754.GN17068@kib.kiev.ua> Date: Fri, 21 Nov 2014 08:14:45 -0800 From: Kirk McKusick Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 16:14:46 -0000 > Date: Fri, 21 Nov 2014 17:57:54 +0200 > From: Konstantin Belousov > To: Rick Macklem > Subject: Re: RFC: patch to make d_fileno 64bits > > I do not think we need to have in-kernel compatibility shims. > The work, big but relatively trivial, is to convert filesystems to > use the new ino_t, even if the on-disk structures still use 32bit > inode number. > > Really problematic part of this change is the usermode ABI breakage. > The struct dirent is only the start of the whole issue. ino_t is > embedded into more structures which are part of the contract, e.g. > struct stat. We have to provide new syscalls which accept or return > the affected structures. > > And then, there are libraries which embed ino_t into their ABI. > Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even > after the base system is fixed by properly providing the compat shims > and symbol versions for the affected libraries, we get the same problem > with the binaries not from base. > > Summary of the issue with ino_t is that it is not too hard to fix the > kernel, comparing with the ABI issues which must be solved in usermode. You are quite right that this is a big and complex process. It was first attempted as a Google Summer of Code project which was later (in August 2011) integrated in projects/ino64. The hurdle for getting it in was too high and it has since languished. We discussed the need to get this done at the MeetBSD developer summit and I agreed to take a fresh look to see if we could pull it off in time for FreeBSD-11. I have started looking at resurrecting the work done in projects/ino64 and will work with Rick to come up with a proposal. As you note, getting a kernel working with backward compatibility is straight-forward. If you have ideas on how to handle the usermode ABI issues, they would be most appreciated. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 16:15:20 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1E3325E9 for ; Fri, 21 Nov 2014 16:15:20 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ECA42634 for ; Fri, 21 Nov 2014 16:15:19 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id EE63EB9B2; Fri, 21 Nov 2014 11:15:18 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Subject: Re: RFC: patch to make d_fileno 64bits Date: Fri, 21 Nov 2014 10:25:45 -0500 Message-ID: <9692214.Bs3rxl0ePH@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-PRERELEASE; KDE/4.14.2; amd64; ; ) In-Reply-To: <539201047.4538834.1416539954794.JavaMail.root@uoguelph.ca> References: <539201047.4538834.1416539954794.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 21 Nov 2014 11:15:19 -0500 (EST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 16:15:20 -0000 On Thursday, November 20, 2014 10:19:14 PM Rick Macklem wrote: > The attached patch covers the basics of a way to > convert the d_fileno field of "struct dirent" to > 64bits. This patch is incomplete and won't even > build, but I thought I'd post it in case anyone > wanted to take a look and comment on the approach > it uses. > > - renames the old/current one "struct dirent32" > - changes d_fileno to 64bits and adds a 64bit > d_off field for the offset of the underlying > file system > - defines a new VOP_READDIR() that will return > the new "struct dirent" that is used as the > default one for a new getdirentries(2). > - the old/current getdirentries(2) uses the old > VOP_READDIR32() by default. > > For the case of a file system that supports both > the new and old VOP_READDIR(), they are used by > the corresponding new and old getdirentries(2) > syscalls. > > For a file system that only supports one of > the VOP_READDIR()s, the "struct dirent32" > is copied to "struct dirent" (or vice versa). > > At this point, all file systems would support > the old VOP_READDIR() and I think the new > VOP_READDIR() can easily be added for NFS, > ZFS. (OpenBSD already has UFS code for > essentially a new struct dirent and hopefully > that code could be ported easily, too.) > > Anyhow, any comments on this approach? rick I think this is already done (along with several other changes) more fully in the projects/ino64 branch in svn? -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 18:21:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8C599443; Fri, 21 Nov 2014 18:21:18 +0000 (UTC) Received: from mail-pa0-x22d.google.com (mail-pa0-x22d.google.com [IPv6:2607:f8b0:400e:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 61F047BD; Fri, 21 Nov 2014 18:21:18 +0000 (UTC) Received: by mail-pa0-f45.google.com with SMTP id lj1so5353857pab.18 for ; Fri, 21 Nov 2014 10:21:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=UdjYP3NsJ7duKL74c5mPyc5oM21Md2BXbYGUGdMMB2E=; b=Q5RzM6UO/CIlodj1BkFSY67Qs4DbLt+axFhGVTJ/pBDSgmp9eHoKdc1rqo9fIo+j9B ptGTYYBc32P1/qslFbZlNgJ7VA+zS55wqAgZMtkFbN3ylvHF7Pucig92preb3ZDCRklj Ls6IU8LVimHzUrKCg6df6N3rBdGmJaZgQVpMJI8v+a+j/6HEoAUQg+i5QO/VI6mlJmeF h3EjNO/9AG0w27wYtF7mpdjFcOklQr1yzJaYFzS9swnInzZqteChkXgyaKgjbjipY8vu Hs7LspXPysJcarXvHKM/ijkeGjsvSjUz5v4a21A6kqm4HOk+0ONvXKahTKzt+tazNwIB 0z0A== X-Received: by 10.66.221.198 with SMTP id qg6mr9951470pac.106.1416594077766; Fri, 21 Nov 2014 10:21:17 -0800 (PST) Received: from localhost (c-76-21-76-83.hsd1.ca.comcast.net. [76.21.76.83]) by mx.google.com with ESMTPSA id mr4sm5427208pdb.64.2014.11.21.10.21.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 21 Nov 2014 10:21:16 -0800 (PST) Sender: Gleb Kurtsou Date: Fri, 21 Nov 2014 10:22:19 -0800 From: Gleb Kurtsou To: John Baldwin Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141121182219.GA1076@reks> References: <539201047.4538834.1416539954794.JavaMail.root@uoguelph.ca> <9692214.Bs3rxl0ePH@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <9692214.Bs3rxl0ePH@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 18:21:18 -0000 On (21/11/2014 10:25), John Baldwin wrote: > On Thursday, November 20, 2014 10:19:14 PM Rick Macklem wrote: > > The attached patch covers the basics of a way to > > convert the d_fileno field of "struct dirent" to > > 64bits. This patch is incomplete and won't even > > build, but I thought I'd post it in case anyone > > wanted to take a look and comment on the approach > > it uses. > > > > - renames the old/current one "struct dirent32" > > - changes d_fileno to 64bits and adds a 64bit > > d_off field for the offset of the underlying > > file system > > - defines a new VOP_READDIR() that will return > > the new "struct dirent" that is used as the > > default one for a new getdirentries(2). > > - the old/current getdirentries(2) uses the old > > VOP_READDIR32() by default. > > > > For the case of a file system that supports both > > the new and old VOP_READDIR(), they are used by > > the corresponding new and old getdirentries(2) > > syscalls. > > > > For a file system that only supports one of > > the VOP_READDIR()s, the "struct dirent32" > > is copied to "struct dirent" (or vice versa). > > > > At this point, all file systems would support > > the old VOP_READDIR() and I think the new > > VOP_READDIR() can easily be added for NFS, > > ZFS. (OpenBSD already has UFS code for > > essentially a new struct dirent and hopefully > > that code could be ported easily, too.) > > > > Anyhow, any comments on this approach? rick > > I think this is already done (along with several other changes) more fully in > the projects/ino64 branch in svn? projects/ino64 was created by mdf for merging GSoC commits, and it didn't even get half way through. I'm currently working on merging the code to CURRENT. It's been more than 2 years, so there is quite some work in there. I intend to update the branch as soon as code is ready for review. Besides branch also changes dev_t to 64-bit, bumps MNAMELEN to 1024 and has complete ABI compatibility shims (probably except openaudit which had issues). From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 23:11:32 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7A87C7BE; Fri, 21 Nov 2014 23:11:32 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 1DDC7B31; Fri, 21 Nov 2014 23:11:31 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtQEAB/Gb1SDaFve/2dsb2JhbABcg2NZBIMCyQsKhhZVAoEdAQEBAQF9hAMBAQQBAQEgKyALGxgCAg0ZAikBCSYGCAcEARwEiCANtm+XFAEBAQEBAQEBAgEBAQEBAQEBGoEtjw0BARs0B4J5gVUFjCSLJYQlhEA/izCCL4dBhBsqMAeBCDmBAwEBAQ X-IronPort-AV: E=Sophos;i="5.07,433,1413259200"; d="scan'208";a="170199629" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 21 Nov 2014 18:11:30 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 414ACB3F43; Fri, 21 Nov 2014 18:11:30 -0500 (EST) Date: Fri, 21 Nov 2014 18:11:30 -0500 (EST) From: Rick Macklem To: Gleb Kurtsou Message-ID: <960332478.5203628.1416611490254.JavaMail.root@uoguelph.ca> In-Reply-To: <20141121182219.GA1076@reks> Subject: Re: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 23:11:32 -0000 Gleb Kurtsou wrote: > On (21/11/2014 10:25), John Baldwin wrote: > > On Thursday, November 20, 2014 10:19:14 PM Rick Macklem wrote: > > > The attached patch covers the basics of a way to > > > convert the d_fileno field of "struct dirent" to > > > 64bits. This patch is incomplete and won't even > > > build, but I thought I'd post it in case anyone > > > wanted to take a look and comment on the approach > > > it uses. > > > > > > - renames the old/current one "struct dirent32" > > > - changes d_fileno to 64bits and adds a 64bit > > > d_off field for the offset of the underlying > > > file system > > > - defines a new VOP_READDIR() that will return > > > the new "struct dirent" that is used as the > > > default one for a new getdirentries(2). > > > - the old/current getdirentries(2) uses the old > > > VOP_READDIR32() by default. > > > > > > For the case of a file system that supports both > > > the new and old VOP_READDIR(), they are used by > > > the corresponding new and old getdirentries(2) > > > syscalls. > > > > > > For a file system that only supports one of > > > the VOP_READDIR()s, the "struct dirent32" > > > is copied to "struct dirent" (or vice versa). > > > > > > At this point, all file systems would support > > > the old VOP_READDIR() and I think the new > > > VOP_READDIR() can easily be added for NFS, > > > ZFS. (OpenBSD already has UFS code for > > > essentially a new struct dirent and hopefully > > > that code could be ported easily, too.) > > > > > > Anyhow, any comments on this approach? rick > > > > I think this is already done (along with several other changes) > > more fully in > > the projects/ino64 branch in svn? > > projects/ino64 was created by mdf for merging GSoC commits, and it > didn't even get half way through. > > I'm currently working on merging the code to CURRENT. It's been more > than 2 years, so there is quite some work in there. I intend to > update > the branch as soon as code is ready for review. > > Besides branch also changes dev_t to 64-bit, bumps MNAMELEN to 1024 > and > has complete ABI compatibility shims (probably except openaudit which > had > issues). > I was aware of Gleb's work and had pinged him to see if he could update his patch. (I'll admit I wasn't aware of projects/ino64 until this morning.) It was my understanding (which could be wrong) that the d_fileno field of "struct dirent" was a tricky bit that hadn't yet been solved (I could be wrong) and started there. I am certainly hoping that Gleb can find the time to update projects/ino64. (I will take a look at what is there now to see what was done w.r.t. d_fileno.) rick > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Fri Nov 21 23:45:54 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6FE9516E for ; Fri, 21 Nov 2014 23:45:54 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1EC3ADE4 for ; Fri, 21 Nov 2014 23:45:53 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ar0EAEnOb1SDaFve/2dsb2JhbABchECDAtAAAoEdAQEBAQF9hAIBAQEDASNWBRYOCgICDRkCWQaISwm2dpcTAQEBAQEBBAEBAQEBAQEbgS2PKjQHgnmBVQWMJJQKi2+CL4dBhBsqgXiBAwEBAQ X-IronPort-AV: E=Sophos;i="5.07,434,1413259200"; d="scan'208";a="171650553" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Nov 2014 18:45:52 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 12EFDB404F; Fri, 21 Nov 2014 18:45:52 -0500 (EST) Date: Fri, 21 Nov 2014 18:45:52 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca> In-Reply-To: <20141121155754.GN17068@kib.kiev.ua> Subject: Re: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2014 23:45:54 -0000 Kostik wrote: > On Thu, Nov 20, 2014 at 10:19:14PM -0500, Rick Macklem wrote: > > The attached patch covers the basics of a way to > > convert the d_fileno field of "struct dirent" to > > 64bits. This patch is incomplete and won't even > > build, but I thought I'd post it in case anyone > > wanted to take a look and comment on the approach > > it uses. > > > > - renames the old/current one "struct dirent32" > > - changes d_fileno to 64bits and adds a 64bit > > d_off field for the offset of the underlying > > file system > > - defines a new VOP_READDIR() that will return > > the new "struct dirent" that is used as the > > default one for a new getdirentries(2). > > - the old/current getdirentries(2) uses the old > > VOP_READDIR32() by default. > > > > For the case of a file system that supports both > > the new and old VOP_READDIR(), they are used by > > the corresponding new and old getdirentries(2) > > syscalls. > > > > For a file system that only supports one of > > the VOP_READDIR()s, the "struct dirent32" > > is copied to "struct dirent" (or vice versa). > > > > At this point, all file systems would support > > the old VOP_READDIR() and I think the new > > VOP_READDIR() can easily be added for NFS, > > ZFS. (OpenBSD already has UFS code for > > essentially a new struct dirent and hopefully > > that code could be ported easily, too.) > > > > Anyhow, any comments on this approach? rick > > I do not think we need to have in-kernel compatibility shims. > The work, big but relatively trivial, is to convert filesystems to > use the new ino_t, even if the on-disk structures still use 32bit > inode number. > What about old binaries that do getdirentries(2) and expect the old structure with 32bit d_fileno or the linux compatibility stuff? I suspect that there are some old staticly linked binaries out there that does/expects the old getdirentries. Having said that, most apps will use readdir(3). Do we need to somehow allow old binaries work with a newer libc? (If so, that's going to be really nasty. I had assumed that old libc code would do old getdirentries(2) and, as such, having a working old and new getdirentries(2) would handle old binaries? I was trying to avoid data copying for the case of an old getdirentries(2) by having file systems provide VOP_READDIR() calls for both old and new structures. It is certainly possible to have all file systems only produce the new "struct dirent" and then just do data copying/conversion to the old one. Btw, I think the new getdirentries(2) will need additional arguments, since the offset for the underlying file system needs to be provided along with the "logical offset", which is the byte offset within the directory being returned as "struct dirent"s. > Really problematic part of this change is the usermode ABI breakage. > The struct dirent is only the start of the whole issue. ino_t is > embedded into more structures which are part of the contract, e.g. > struct stat. We have to provide new syscalls which accept or return > the affected structures. > > And then, there are libraries which embed ino_t into their ABI. > Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even > after the base system is fixed by properly providing the compat shims > and symbol versions for the affected libraries, we get the same > problem > with the binaries not from base. > > Summary of the issue with ino_t is that it is not too hard to fix the > kernel, comparing with the ABI issues which must be solved in > usermode. > > Yes, I was just going to look at d_fileno as a starting point. (For whatever reason d_fileno isn't defined as ino_t?) I was specifically avoiding any use of "ino_t" and saw it as something that needed to eventually change to 64 bits at the very end. I was aware of Gleb Kurtsou's work, but didn't realize it lived in projects/ino64 and he had mentioned that he was busy, but would try and find time to update the patch. I will look at projects/ino64 and it sounds like Kirk would like to figure it all out in projects/ino64 and eventually do a "super patch" to head. This sounds fine to me, if we can pull it off. rick From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 00:28:17 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A65DBA00 for ; Sat, 22 Nov 2014 00:28:17 +0000 (UTC) Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com [IPv6:2a00:1450:400c:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 271951CC for ; Sat, 22 Nov 2014 00:28:17 +0000 (UTC) Received: by mail-wg0-f47.google.com with SMTP id n12so7897757wgh.20 for ; Fri, 21 Nov 2014 16:28:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=jKIQpFG3e8VflUgKd15UFe2A9pAi7nkcuqZ/UHvG9GY=; b=O/uqGyLF8u7wHpolJDHpgl523isOKv5nxY+ZahgUnB3JkY8DBwYC9U3gEMKw15uAk/ 6M0sGweuS8oHo9fNI89O6puUvV5g/Zcy0ccScY/+3RiHLbocUBrXvOGJArx/0zh9p89l Z/C8iBpBZSH5reg5dv/Au2Ezre0WOCbftnRWumhjPiJEnbVuz7dQSfWEls66X2SzuNNe fYLZlYB+Fs/q/GFscX0IzpDDCaPIJqNyo8LhfN0yRfdCtBd3JIUEIBBopzZW8o67rPMQ BpPBBvyLa5Qc754/7C11LOrpwk4vWLQSa7iTvSFZTaGtgqnHxPi0qDoqTLwI7z/wYTEm qU6g== X-Received: by 10.180.13.7 with SMTP id d7mr1426130wic.57.1416616095588; Fri, 21 Nov 2014 16:28:15 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id wr9sm10095213wjb.42.2014.11.21.16.28.14 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 21 Nov 2014 16:28:14 -0800 (PST) Date: Sat, 22 Nov 2014 01:28:12 +0100 From: Mateusz Guzik To: freebsd-fs@freebsd.org Subject: atomic v_usecount and v_holdcnt Message-ID: <20141122002812.GA32289@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 00:28:17 -0000 The idea is that we don't need an interlock as long as we don't transition either counter 1->0 or 0->1. Patch itself is more of a PoC, so I didn't rename vholdl & friends just yet. It helps during lookups with same vnodes since the interlock which was taken twice served as a serializatin point and this effect is now reduced. There are other places which can avoid VI_LOCK + vget scheme. Patch below survived make -j 40 buildworld, poudriere with 40 workers etc on a 2 package(s) x 10 core(s) x 2 SMT threads machine with and without debugs (including DEBUG_VFS_LOCKS). Perf difference: in a crap microbenchmark of 40 threads doing a stat on /foo/bar/baz/quux${N}, where each thread stats a separate file I got over 4 times speed up on tmpfs. Comments? diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c index 83f29c1..b587ebd 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c @@ -99,6 +99,6 @@ vn_rele_async(vnode_t *vp, taskq_t *taskq) (task_func_t *)vn_rele_inactive, vp, TQ_SLEEP) != 0); return; } - vp->v_usecount--; + refcount_release(&vp->v_usecount); vdropl(vp); } diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index 55e3217..ec26d35 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -665,12 +665,12 @@ success: ltype = VOP_ISLOCKED(dvp); VOP_UNLOCK(dvp, 0); } - VI_LOCK(*vpp); + vholdl(*vpp); if (wlocked) CACHE_WUNLOCK(); else CACHE_RUNLOCK(); - error = vget(*vpp, cnp->cn_lkflags | LK_INTERLOCK, cnp->cn_thread); + error = vget_held(*vpp, cnp->cn_lkflags, cnp->cn_thread); if (cnp->cn_flags & ISDOTDOT) { vn_lock(dvp, ltype | LK_RETRY); if (dvp->v_iflag & VI_DOOMED) { @@ -1376,9 +1376,9 @@ vn_dir_dd_ino(struct vnode *vp) if ((ncp->nc_flag & NCF_ISDOTDOT) != 0) continue; ddvp = ncp->nc_dvp; - VI_LOCK(ddvp); + vholdl(ddvp); CACHE_RUNLOCK(); - if (vget(ddvp, LK_INTERLOCK | LK_SHARED | LK_NOWAIT, curthread)) + if (vget_held(ddvp, LK_SHARED | LK_NOWAIT, curthread)) return (NULL); return (ddvp); } diff --git a/sys/kern/vfs_hash.c b/sys/kern/vfs_hash.c index 0271e49..d2fdbba 100644 --- a/sys/kern/vfs_hash.c +++ b/sys/kern/vfs_hash.c @@ -83,9 +83,9 @@ vfs_hash_get(const struct mount *mp, u_int hash, int flags, struct thread *td, s continue; if (fn != NULL && fn(vp, arg)) continue; - VI_LOCK(vp); + vhold(vp); mtx_unlock(&vfs_hash_mtx); - error = vget(vp, flags | LK_INTERLOCK, td); + error = vget_held(vp, flags, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; if (error) @@ -127,9 +127,9 @@ vfs_hash_insert(struct vnode *vp, u_int hash, int flags, struct thread *td, stru continue; if (fn != NULL && fn(vp2, arg)) continue; - VI_LOCK(vp2); + vhold(vp2); mtx_unlock(&vfs_hash_mtx); - error = vget(vp2, flags | LK_INTERLOCK, td); + error = vget_held(vp2, flags, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; mtx_lock(&vfs_hash_mtx); diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 345aad6..53bdf0d 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -854,7 +854,7 @@ vnlru_free(int count) */ freevnodes--; vp->v_iflag &= ~VI_FREE; - vp->v_holdcnt++; + refcount_acquire(&vp->v_holdcnt); mtx_unlock(&vnode_free_list_mtx); VI_UNLOCK(vp); @@ -2052,12 +2052,7 @@ v_incr_usecount(struct vnode *vp) CTR2(KTR_VFS, "%s: vp %p", __func__, vp); vholdl(vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); - } + v_upgrade_usecount(vp); } /* @@ -2067,9 +2062,24 @@ v_incr_usecount(struct vnode *vp) static void v_upgrade_usecount(struct vnode *vp) { + int old, locked; CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount++; +retry: + old = vp->v_usecount; + if (old > 0) { + if (atomic_cmpset_int(&vp->v_usecount, old, old + 1)) + goto dev; + goto retry; + } + + locked = mtx_owned(VI_MTX(vp)); + if (!locked) + VI_LOCK(vp); + refcount_acquire(&vp->v_usecount); + if (!locked) + VI_UNLOCK(vp); +dev: if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); vp->v_rdev->si_usecount++; @@ -2086,16 +2096,10 @@ static void v_decr_usecount(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); VNASSERT(vp->v_usecount > 0, vp, ("v_decr_usecount: negative usecount")); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount--; - dev_unlock(); - } + v_decr_useonly(vp); vdropl(vp); } @@ -2108,12 +2112,27 @@ v_decr_usecount(struct vnode *vp) static void v_decr_useonly(struct vnode *vp) { + int old, locked; - ASSERT_VI_LOCKED(vp, __FUNCTION__); VNASSERT(vp->v_usecount > 0, vp, ("v_decr_useonly: negative usecount")); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; +retry: + old = vp->v_usecount; + if (old > 1) { + if (atomic_cmpset_int(&vp->v_usecount, old, old - 1)) { + goto dev; + } + goto retry; + } + + locked = mtx_owned(VI_MTX(vp)); + if (!locked) + VI_LOCK(vp); + refcount_release(&vp->v_usecount); + if (!locked) + VI_UNLOCK(vp); +dev: if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); vp->v_rdev->si_usecount--; @@ -2129,19 +2148,15 @@ v_decr_useonly(struct vnode *vp) * vput try to do it here. */ int -vget(struct vnode *vp, int flags, struct thread *td) +vget_held(struct vnode *vp, int flags, struct thread *td) { int error; - error = 0; VNASSERT((flags & LK_TYPE_MASK) != 0, vp, ("vget: invalid lock operation")); CTR3(KTR_VFS, "%s: vp %p with flags %d", __func__, vp, flags); - if ((flags & LK_INTERLOCK) == 0) - VI_LOCK(vp); - vholdl(vp); - if ((error = vn_lock(vp, flags | LK_INTERLOCK)) != 0) { + if ((error = vn_lock(vp, flags)) != 0) { vdrop(vp); CTR2(KTR_VFS, "%s: impossible to lock vnode %p", __func__, vp); @@ -2149,7 +2164,6 @@ vget(struct vnode *vp, int flags, struct thread *td) } if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0) panic("vget: vn_lock failed to return ENOENT\n"); - VI_LOCK(vp); /* Upgrade our holdcnt to a usecount. */ v_upgrade_usecount(vp); /* @@ -2159,15 +2173,26 @@ vget(struct vnode *vp, int flags, struct thread *td) * we don't succeed no harm is done. */ if (vp->v_iflag & VI_OWEINACT) { - if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && - (flags & LK_NOWAIT) == 0) - vinactive(vp, td); - vp->v_iflag &= ~VI_OWEINACT; + VI_LOCK(vp); + if (vp->v_iflag & VI_OWEINACT) { + if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && + (flags & LK_NOWAIT) == 0) + vinactive(vp, td); + vp->v_iflag &= ~VI_OWEINACT; + } + VI_UNLOCK(vp); } - VI_UNLOCK(vp); return (0); } +int +vget(struct vnode *vp, int flags, struct thread *td) +{ + + vhold(vp); + return (vget_held(vp, flags, td)); +} + /* * Increase the reference count of a vnode. */ @@ -2176,9 +2201,7 @@ vref(struct vnode *vp) { CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - VI_LOCK(vp); v_incr_usecount(vp); - VI_UNLOCK(vp); } /* @@ -2195,9 +2218,7 @@ vrefcnt(struct vnode *vp) { int usecnt; - VI_LOCK(vp); usecnt = vp->v_usecount; - VI_UNLOCK(vp); return (usecnt); } @@ -2210,6 +2231,7 @@ static void vputx(struct vnode *vp, int func) { int error; + int old; KASSERT(vp != NULL, ("vputx: null vp")); if (func == VPUTX_VUNREF) @@ -2219,11 +2241,26 @@ vputx(struct vnode *vp, int func) else KASSERT(func == VPUTX_VRELE, ("vputx: wrong func")); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); + +retry: + old = vp->v_usecount; + if (old > 1) { + if (atomic_cmpset_int(&vp->v_usecount, old, old - 1)) { + if (func == VPUTX_VPUT) + VOP_UNLOCK(vp, 0); + if (vp->v_type == VCHR && vp->v_rdev != NULL) { + dev_lock(); + vp->v_rdev->si_usecount--; + dev_unlock(); + } + vdropl(vp); + return; + } + goto retry; + } + VI_LOCK(vp); - /* Skip this v_writecount check if we're going to panic below. */ - VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp, - ("vputx: missed vn_close")); error = 0; if (vp->v_usecount > 1 || ((vp->v_iflag & VI_DOINGINACT) && @@ -2320,9 +2357,7 @@ void vhold(struct vnode *vp) { - VI_LOCK(vp); vholdl(vp); - VI_UNLOCK(vp); } /* @@ -2332,20 +2367,30 @@ void vholdl(struct vnode *vp) { struct mount *mp; + int old; + int locked; CTR2(KTR_VFS, "%s: vp %p", __func__, vp); -#ifdef INVARIANTS - /* getnewvnode() calls v_incr_usecount() without holding interlock. */ - if (vp->v_type != VNON || vp->v_data != NULL) { - ASSERT_VI_LOCKED(vp, "vholdl"); - VNASSERT(vp->v_holdcnt > 0 || (vp->v_iflag & VI_FREE) != 0, - vp, ("vholdl: free vnode is held")); +retry: + old = vp->v_holdcnt; + if (old > 0) { + if (atomic_cmpset_int(&vp->v_holdcnt, old, old + 1)) { + VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, + ("vholdl: vnode with usecount is free")); + return; + } + goto retry; } -#endif - vp->v_holdcnt++; - if ((vp->v_iflag & VI_FREE) == 0) + locked = mtx_owned(VI_MTX(vp)); + if (!locked) + VI_LOCK(vp); + if ((vp->v_iflag & VI_FREE) == 0) { + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); return; - VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count")); + } + VNASSERT(vp->v_holdcnt == 0, vp, ("vholdl: wrong hold count")); VNASSERT(vp->v_op != NULL, vp, ("vholdl: vnode already reclaimed.")); /* * Remove a vnode from the free list, mark it as in use, @@ -2362,6 +2407,9 @@ vholdl(struct vnode *vp) TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); mp->mnt_activevnodelistsize++; mtx_unlock(&vnode_free_list_mtx); + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); } /* @@ -2372,7 +2420,6 @@ void vdrop(struct vnode *vp) { - VI_LOCK(vp); vdropl(vp); } @@ -2387,15 +2434,27 @@ vdropl(struct vnode *vp) struct bufobj *bo; struct mount *mp; int active; + int old; + int locked; - ASSERT_VI_LOCKED(vp, "vdropl"); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); if (vp->v_holdcnt <= 0) panic("vdrop: holdcnt %d", vp->v_holdcnt); - vp->v_holdcnt--; - VNASSERT(vp->v_holdcnt >= vp->v_usecount, vp, - ("hold count less than use count")); - if (vp->v_holdcnt > 0) { + locked = mtx_owned(VI_MTX(vp)); +retry: + old = vp->v_holdcnt; + if (old > 1) { + if (atomic_cmpset_int(&vp->v_holdcnt, old, old -1)) { + if (locked) + VI_UNLOCK(vp); + return; + } + goto retry; + } + + if (!locked) + VI_LOCK(vp); + if (refcount_release(&vp->v_holdcnt) == 0) { VI_UNLOCK(vp); return; } diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index c78b9d1..87eee4a 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -651,6 +651,7 @@ void vdrop(struct vnode *); void vdropl(struct vnode *); int vflush(struct mount *mp, int rootrefs, int flags, struct thread *td); int vget(struct vnode *vp, int lockflag, struct thread *td); +int vget_held(struct vnode *vp, int lockflag, struct thread *td); void vgone(struct vnode *vp); void vhold(struct vnode *); void vholdl(struct vnode *); From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 00:33:08 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 32339A9B; Sat, 22 Nov 2014 00:33:08 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CB866282; Sat, 22 Nov 2014 00:33:07 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtQEANbYb1SDaFve/2dsb2JhbABcg2NZBIMCyQsKhhZVAoEdAQEBAQF9hAIBAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAcEARwEiBcJDbZdlxQBAQEBAQEBAQIBAQEBAQEBARqBLY8NAQEbNAeCeYFVBYwkiyWEJYRAP4MaiBaCL4dBggIggXkqMAeBCDmBAwEBAQ X-IronPort-AV: E=Sophos;i="5.07,434,1413259200"; d="scan'208";a="171656842" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 21 Nov 2014 19:33:06 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8F645B3F2B; Fri, 21 Nov 2014 19:33:06 -0500 (EST) Date: Fri, 21 Nov 2014 19:33:06 -0500 (EST) From: Rick Macklem To: Gleb Kurtsou Message-ID: <1346064334.5234809.1416616386575.JavaMail.root@uoguelph.ca> In-Reply-To: <20141121182219.GA1076@reks> Subject: Re: RFC: patch to make d_fileno 64bits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.209] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 00:33:08 -0000 Gleb Kurtsou wrote: > On (21/11/2014 10:25), John Baldwin wrote: > > On Thursday, November 20, 2014 10:19:14 PM Rick Macklem wrote: > > > The attached patch covers the basics of a way to > > > convert the d_fileno field of "struct dirent" to > > > 64bits. This patch is incomplete and won't even > > > build, but I thought I'd post it in case anyone > > > wanted to take a look and comment on the approach > > > it uses. > > > > > > - renames the old/current one "struct dirent32" > > > - changes d_fileno to 64bits and adds a 64bit > > > d_off field for the offset of the underlying > > > file system > > > - defines a new VOP_READDIR() that will return > > > the new "struct dirent" that is used as the > > > default one for a new getdirentries(2). > > > - the old/current getdirentries(2) uses the old > > > VOP_READDIR32() by default. > > > > > > For the case of a file system that supports both > > > the new and old VOP_READDIR(), they are used by > > > the corresponding new and old getdirentries(2) > > > syscalls. > > > > > > For a file system that only supports one of > > > the VOP_READDIR()s, the "struct dirent32" > > > is copied to "struct dirent" (or vice versa). > > > > > > At this point, all file systems would support > > > the old VOP_READDIR() and I think the new > > > VOP_READDIR() can easily be added for NFS, > > > ZFS. (OpenBSD already has UFS code for > > > essentially a new struct dirent and hopefully > > > that code could be ported easily, too.) > > > > > > Anyhow, any comments on this approach? rick > > > > I think this is already done (along with several other changes) > > more fully in > > the projects/ino64 branch in svn? > > projects/ino64 was created by mdf for merging GSoC commits, and it > didn't even get half way through. > > I'm currently working on merging the code to CURRENT. It's been more > than 2 years, so there is quite some work in there. I intend to > update > the branch as soon as code is ready for review. > Btw, I just took a quick look and I didn't find any changes to "struct dirent" in projects/ino64, so I think my original assumption that this piece of the puzzle hadn't yet been solved, is correct. (Gleb, if you had changes to "struct dirent" and related fs changes, please let me know.) Oh, and thanks to some comments, the new struct dirent has already changed to: struct dirent { __uint64_t d_cookie; /* dir cookie for next dir entry */ __uint64_t d_fileno; __uint16_t d_reclen; __uint8_t d_type; __uint8_t d_namlen; __uint8_t d_pad[4]; /* align d_name to 8 byte boundary */ __uint8_t d_name[MAXNAMLEN + 1]; }; It was pointed out that C would pad the structure to a multiple of 8 bytes for some arches and without d_pad that would imply d_name wasn't at the end of the structure. (Apparently code somewhere find d_name by subtracting MAXNAMLEN + 1 from sizeof(struct dirent) and this fails if d_name isn't at the end. Yuck, but the above fixes it.) However, the size of d_namlen could become uint16_t, if anyone thinks MAXNAMLEN might want to be greater than 255 someday (long away, since that's another ABI change). rick > Besides branch also changes dev_t to 64-bit, bumps MNAMELEN to 1024 > and > has complete ABI compatibility shims (probably except openaudit which > had > issues). > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 09:25:33 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4698E9E0 for ; Sat, 22 Nov 2014 09:25:33 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AF02F975 for ; Sat, 22 Nov 2014 09:25:32 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAM9PRrq046793 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Nov 2014 11:25:27 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAM9PRrq046793 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAM9PRT1046791; Sat, 22 Nov 2014 11:25:27 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Nov 2014 11:25:27 +0200 From: Konstantin Belousov To: Mateusz Guzik Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20141122092527.GT17068@kib.kiev.ua> References: <20141122002812.GA32289@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141122002812.GA32289@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 09:25:33 -0000 On Sat, Nov 22, 2014 at 01:28:12AM +0100, Mateusz Guzik wrote: > The idea is that we don't need an interlock as long as we don't > transition either counter 1->0 or 0->1. > > Patch itself is more of a PoC, so I didn't rename vholdl & friends just > yet. > > It helps during lookups with same vnodes since the interlock which was > taken twice served as a serializatin point and this effect is now > reduced. > > There are other places which can avoid VI_LOCK + vget scheme. > > Patch below survived make -j 40 buildworld, poudriere with 40 workers > etc on a 2 package(s) x 10 core(s) x 2 SMT threads machine with and > without debugs (including DEBUG_VFS_LOCKS). > > Perf difference: > > in a crap microbenchmark of 40 threads doing a stat on > /foo/bar/baz/quux${N}, where each thread stats a separate file I got > over 4 times speed up on tmpfs. > > Comments? I already said that something along the lines of the patch should work. In fact, you need vnode lock when hold count changes between 0 and 1, and probably the same for use count. Some notes about the patch. mtx_owned() braces are untolerable ugliness. You should either pass a boolean flag (preferred), or create locked/unlocked versions of the functions. Similarly, I dislike vget_held(). Add a flag to vget(), see LK_EATTR_MASK in sys/lockmgr.h. Could there be consequences of not taking vnode interlock and passing LK_INTERLOCK to vn_lock() in vget() ? Taking interlock when vnode lock is already owned is probably fine and does not add to contention. I mean that making VI_OWEINACT so loose breaks the VOP_INACTIVE() contract. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 14:21:25 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 3E65884A for ; Sat, 22 Nov 2014 14:21:25 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 263377F3 for ; Sat, 22 Nov 2014 14:21:25 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sAMELPWD095065 for ; Sat, 22 Nov 2014 14:21:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 141305] [zfs] FreeBSD ZFS+sendfile severe performance issues (no cache) Date: Sat, 22 Nov 2014 14:21:25 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: me@nileshgr.com X-Bugzilla-Status: Issue Resolved X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 14:21:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=141305 me@nileshgr.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |me@nileshgr.com --- Comment #5 from me@nileshgr.com --- Is this really fixed? I'm using 10.1 and I had enabled Sendfile in Apache 2.4. System load was 10-13 (on a quad core with HT). The moment I disabled sendfile, system load went down to < 1. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 15:34:34 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1A8483F for ; Sat, 22 Nov 2014 15:34:34 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 44DB5E2B for ; Sat, 22 Nov 2014 15:34:34 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAMFYSjJ044615 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Nov 2014 17:34:28 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAMFYSjJ044615 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAMFYRvc044614; Sat, 22 Nov 2014 17:34:27 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Nov 2014 17:34:27 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141122153427.GW17068@kib.kiev.ua> References: <20141121155754.GN17068@kib.kiev.ua> <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 15:34:34 -0000 On Fri, Nov 21, 2014 at 06:45:52PM -0500, Rick Macklem wrote: > Kostik wrote: > > On Thu, Nov 20, 2014 at 10:19:14PM -0500, Rick Macklem wrote: > > > The attached patch covers the basics of a way to > > > convert the d_fileno field of "struct dirent" to > > > 64bits. This patch is incomplete and won't even > > > build, but I thought I'd post it in case anyone > > > wanted to take a look and comment on the approach > > > it uses. > > > > > > - renames the old/current one "struct dirent32" > > > - changes d_fileno to 64bits and adds a 64bit > > > d_off field for the offset of the underlying > > > file system > > > - defines a new VOP_READDIR() that will return > > > the new "struct dirent" that is used as the > > > default one for a new getdirentries(2). > > > - the old/current getdirentries(2) uses the old > > > VOP_READDIR32() by default. > > > > > > For the case of a file system that supports both > > > the new and old VOP_READDIR(), they are used by > > > the corresponding new and old getdirentries(2) > > > syscalls. > > > > > > For a file system that only supports one of > > > the VOP_READDIR()s, the "struct dirent32" > > > is copied to "struct dirent" (or vice versa). > > > > > > At this point, all file systems would support > > > the old VOP_READDIR() and I think the new > > > VOP_READDIR() can easily be added for NFS, > > > ZFS. (OpenBSD already has UFS code for > > > essentially a new struct dirent and hopefully > > > that code could be ported easily, too.) > > > > > > Anyhow, any comments on this approach? rick > > > > I do not think we need to have in-kernel compatibility shims. > > The work, big but relatively trivial, is to convert filesystems to > > use the new ino_t, even if the on-disk structures still use 32bit > > inode number. > > > What about old binaries that do getdirentries(2) and expect the old > structure with 32bit d_fileno or the linux compatibility stuff? > I suspect that there are some old staticly linked binaries out there > that does/expects the old getdirentries. No, let me restate my position. There are two places for backward compatibility, on is in-kernel binary interface, and another is applications, i.e. KBI and ABI. My opinion is that we must provide strict backward ABI compatibility to have even right to be called useful OS. In particular, the syscalls like current getdirentries (156 and 196) providing 32-bit inonums, must be kept with their current binary contract. The userspace issues do not end there, but this is not the currently discussed item. On the other hand, providing KBI compat for filesystems which work right now with 32bit inode numbers, should not be done. I.e., no VOP_READDIR_32INO(), all filesystems must be converted once. For syscalls 156 and 196 (and some more), the converter must be written in the vfs_syscalls.c which translates the new dirents into old dirents, at the level of best efforts. > > Having said that, most apps will use readdir(3). Do we need to somehow > allow old binaries work with a newer libc? (If so, that's going to be > really nasty. I had assumed that old libc code would do old > getdirentries(2) and, as such, having a working old and new getdirentries(2) > would handle old binaries? > > I was trying to avoid data copying for the case of an old getdirentries(2) > by having file systems provide VOP_READDIR() calls for both old and new > structures. > It is certainly possible to have all file systems only produce the new > "struct dirent" and then just do data copying/conversion to the old one. > > Btw, I think the new getdirentries(2) will need additional arguments, > since the offset for the underlying file system needs to be provided > along with the "logical offset", which is the byte offset within the > directory being returned as "struct dirent"s. > > > Really problematic part of this change is the usermode ABI breakage. > > The struct dirent is only the start of the whole issue. ino_t is > > embedded into more structures which are part of the contract, e.g. > > struct stat. We have to provide new syscalls which accept or return > > the affected structures. > > > > And then, there are libraries which embed ino_t into their ABI. > > Immediate example is fts(3) in libc. Look at the FTSENT.fts_ino. Even > > after the base system is fixed by properly providing the compat shims > > and symbol versions for the affected libraries, we get the same > > problem > > with the binaries not from base. > > > > Summary of the issue with ino_t is that it is not too hard to fix the > > kernel, comparing with the ABI issues which must be solved in > > usermode. > > > > > Yes, I was just going to look at d_fileno as a starting point. > (For whatever reason d_fileno isn't defined as ino_t?) > > I was specifically avoiding any use of "ino_t" and saw it as something > that needed to eventually change to 64 bits at the very end. > I was aware of Gleb Kurtsou's work, but didn't realize it lived > in projects/ino64 and he had mentioned that he was busy, but > would try and find time to update the patch. > I will look at projects/ino64 and it sounds like Kirk > would like to figure it all out in projects/ino64 and > eventually do a "super patch" to head. This sounds fine > to me, if we can pull it off. > > rick From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 16:21:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 72F718D5 for ; Sat, 22 Nov 2014 16:21:59 +0000 (UTC) Received: from relay02.pair.com (relay02.pair.com [209.68.5.16]) by mx1.freebsd.org (Postfix) with SMTP id 1298E330 for ; Sat, 22 Nov 2014 16:21:58 +0000 (UTC) Received: (qmail 20976 invoked from network); 22 Nov 2014 16:21:57 -0000 Received: from 188.182.139.176 (HELO x2.osted.lan) (188.182.139.176) by relay02.pair.com with SMTP; 22 Nov 2014 16:21:57 -0000 X-pair-Authenticated: 188.182.139.176 Received: from x2.osted.lan (localhost [127.0.0.1]) by x2.osted.lan (8.14.9/8.14.9) with ESMTP id sAMGLtd5098548 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 22 Nov 2014 17:21:55 +0100 (CET) (envelope-from pho@x2.osted.lan) Received: (from pho@localhost) by x2.osted.lan (8.14.9/8.14.9/Submit) id sAMGLtwL098547 for freebsd-fs@freebsd.org; Sat, 22 Nov 2014 17:21:55 +0100 (CET) (envelope-from pho) Date: Sat, 22 Nov 2014 17:21:55 +0100 From: Peter Holm To: FreeBSD Filesystems Subject: Possible memory corruption with FUSE Message-ID: <20141122162155.GA98201@x2.osted.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 16:21:59 -0000 While testing a fix I came across a deadlock which was caused by memory corruption. FUSE was a suspect, as fuse.ko had been loaded and unloaded at one point earlier in the test (no actual fuse tests were done). A test was constructed that loaded and unloaded fuse.ko under VM pressure. This triggers a page fault in the page daemon: http://people.freebsd.org/~pho/stress/log/fuse3.txt - Peter From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 16:22:04 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DE18D93C for ; Sat, 22 Nov 2014 16:22:04 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2001:1868:205::9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C9660334 for ; Sat, 22 Nov 2014 16:22:04 +0000 (UTC) Received: from hch by bombadil.infradead.org with local (Exim 4.80.1 #2 (Red Hat Linux)) id 1XsDRh-00024l-4J; Sat, 22 Nov 2014 16:22:01 +0000 Date: Sat, 22 Nov 2014 08:22:01 -0800 From: Christoph Hellwig To: Konstantin Belousov Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141122162201.GA27585@infradead.org> References: <20141121155754.GN17068@kib.kiev.ua> <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca> <20141122153427.GW17068@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141122153427.GW17068@kib.kiev.ua> User-Agent: Mutt/1.5.23 (2014-03-12) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 16:22:05 -0000 On Sat, Nov 22, 2014 at 05:34:27PM +0200, Konstantin Belousov wrote: > For syscalls 156 and 196 (and some more), the converter must be written > in the vfs_syscalls.c which translates the new dirents into old dirents, > at the level of best efforts. FYI, you might want to look at the high level construct we use for that in Linux, where we pass a function pointer to format the dirent to the VOP_READDIR equivalent. The function pointer is passed by the caller and can format all kinds of different dirent stuctures. This hasn't just been helpful for the 64bit ino dirent transition, but also for foreign OS compatibility layers and in-kernel consumers like nfsd. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 17:23:51 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A0FC26F5 for ; Sat, 22 Nov 2014 17:23:51 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 81C65B96 for ; Sat, 22 Nov 2014 17:23:51 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id sAMHNfs6005778; Sat, 22 Nov 2014 09:23:41 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201411221723.sAMHNfs6005778@chez.mckusick.com> To: Konstantin Belousov Subject: Re: RFC: patch to make d_fileno 64bits In-reply-to: <20141122153427.GW17068@kib.kiev.ua> Date: Sat, 22 Nov 2014 09:23:41 -0800 From: Kirk McKusick Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 17:23:51 -0000 > Date: Sat, 22 Nov 2014 17:34:27 +0200 > From: Konstantin Belousov > To: Rick Macklem > Subject: Re: RFC: patch to make d_fileno 64bits > Cc: FreeBSD Filesystems > > On Fri, Nov 21, 2014 at 06:45:52PM -0500, Rick Macklem wrote: >> Kostik wrote: >> What about old binaries that do getdirentries(2) and expect the old >> structure with 32bit d_fileno or the linux compatibility stuff? >> I suspect that there are some old staticly linked binaries out there >> that does/expects the old getdirentries. > > No, let me restate my position. There are two places for backward > compatibility, one is in-kernel binary interface, and another is > applications ,i.e. KBI and ABI. > > My opinion is that we must provide strict backward ABI compatibility > to have even right to be called useful OS. In particular, the syscalls > like current getdirentries (156 and 196) providing 32-bit inonums, must > be kept with their current binary contract. The userspace issues do > not end there, but this is not the currently discussed item. > > On the other hand, providing KBI compat for filesystems which work > right now with 32bit inode numbers, should not be done. I.e., no > VOP_READDIR_32INO(), all filesystems must be converted once. > > For syscalls 156 and 196 (and some more), the converter must be written > in the vfs_syscalls.c which translates the new dirents into old dirents, > at the level of best efforts. I believe that we are all in agreement with you on the kernel approach at this point. Do we have a way of versioning libc so that we can have the old version that provides the 32-bit version of the syscalls (156 and 196) along with 32-bit higher-level functions like fts and friends and then a new libc version that has the 64-bit version of the syscalls and other higher-level functions? Kirk From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 17:55:37 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 588C99A3 for ; Sat, 22 Nov 2014 17:55:37 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EEF33E6C for ; Sat, 22 Nov 2014 17:55:36 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAMHtVkq098530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Nov 2014 19:55:31 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAMHtVkq098530 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAMHtV6f098529; Sat, 22 Nov 2014 19:55:31 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Nov 2014 19:55:31 +0200 From: Konstantin Belousov To: Kirk McKusick Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141122175531.GZ17068@kib.kiev.ua> References: <20141122153427.GW17068@kib.kiev.ua> <201411221723.sAMHNfs6005778@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201411221723.sAMHNfs6005778@chez.mckusick.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 17:55:37 -0000 On Sat, Nov 22, 2014 at 09:23:41AM -0800, Kirk McKusick wrote: > > Date: Sat, 22 Nov 2014 17:34:27 +0200 > > From: Konstantin Belousov > > To: Rick Macklem > > Subject: Re: RFC: patch to make d_fileno 64bits > > Cc: FreeBSD Filesystems > > > > On Fri, Nov 21, 2014 at 06:45:52PM -0500, Rick Macklem wrote: > >> Kostik wrote: > >> What about old binaries that do getdirentries(2) and expect the old > >> structure with 32bit d_fileno or the linux compatibility stuff? > >> I suspect that there are some old staticly linked binaries out there > >> that does/expects the old getdirentries. > > > > No, let me restate my position. There are two places for backward > > compatibility, one is in-kernel binary interface, and another is > > applications ,i.e. KBI and ABI. > > > > My opinion is that we must provide strict backward ABI compatibility > > to have even right to be called useful OS. In particular, the syscalls > > like current getdirentries (156 and 196) providing 32-bit inonums, must > > be kept with their current binary contract. The userspace issues do > > not end there, but this is not the currently discussed item. > > > > On the other hand, providing KBI compat for filesystems which work > > right now with 32bit inode numbers, should not be done. I.e., no > > VOP_READDIR_32INO(), all filesystems must be converted once. > > > > For syscalls 156 and 196 (and some more), the converter must be written > > in the vfs_syscalls.c which translates the new dirents into old dirents, > > at the level of best efforts. > > I believe that we are all in agreement with you on the kernel approach > at this point. Well, I think this was the Rick patch and proposal to have compat ino32 in kernel. > > Do we have a way of versioning libc so that we can have the old version > that provides the 32-bit version of the syscalls (156 and 196) along > with 32-bit higher-level functions like fts and friends and then a new > libc version that has the 64-bit version of the syscalls and other > higher-level functions? We do not need several versions of libc. We support symbol versioning, i.e. we can have old getdirents symbol which resolves to syscall stub for 196, and new getdirents for new syscall. It is somewhat convoluted feature, you could look at example in sys/kern/sysv_*.c, for instance, freebsd7_shmctl and shmctl. Also look at libc/include/compat.h. For pure usermode compat shims, lib/libc/gen/fts-compat.c was already handled one time. I promise to write neccessary magic for libc versioning when needed. As I explained before, unfortunately the libc is not the final point for the userspace compat. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 17:56:26 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B893FB35 for ; Sat, 22 Nov 2014 17:56:26 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2A53EE72 for ; Sat, 22 Nov 2014 17:56:25 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAMHuL7t099299 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Nov 2014 19:56:21 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAMHuL7t099299 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAMHuK9O099298; Sat, 22 Nov 2014 19:56:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Nov 2014 19:56:20 +0200 From: Konstantin Belousov To: Christoph Hellwig Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141122175620.GA17068@kib.kiev.ua> References: <20141121155754.GN17068@kib.kiev.ua> <420608613.5215411.1416613552066.JavaMail.root@uoguelph.ca> <20141122153427.GW17068@kib.kiev.ua> <20141122162201.GA27585@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141122162201.GA27585@infradead.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 17:56:26 -0000 On Sat, Nov 22, 2014 at 08:22:01AM -0800, Christoph Hellwig wrote: > On Sat, Nov 22, 2014 at 05:34:27PM +0200, Konstantin Belousov wrote: > > For syscalls 156 and 196 (and some more), the converter must be written > > in the vfs_syscalls.c which translates the new dirents into old dirents, > > at the level of best efforts. > > FYI, you might want to look at the high level construct we use for that > in Linux, where we pass a function pointer to format the dirent to the > VOP_READDIR equivalent. The function pointer is passed by the caller > and can format all kinds of different dirent stuctures. This hasn't > just been helpful for the 64bit ino dirent transition, but also for > foreign OS compatibility layers and in-kernel consumers like nfsd. Yes, this is very promising approach, I agree. Thank you for the pointer. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 18:13:02 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 746F1511 for ; Sat, 22 Nov 2014 18:13:02 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4F9B39E for ; Sat, 22 Nov 2014 18:13:02 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id sAMICsv8016811; Sat, 22 Nov 2014 10:12:54 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201411221812.sAMICsv8016811@chez.mckusick.com> To: Konstantin Belousov Subject: Re: RFC: patch to make d_fileno 64bits In-reply-to: <20141122175531.GZ17068@kib.kiev.ua> Date: Sat, 22 Nov 2014 10:12:54 -0800 From: Kirk McKusick Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 18:13:02 -0000 > Date: Sat, 22 Nov 2014 19:55:31 +0200 > From: Konstantin Belousov > To: Kirk McKusick > Cc: Rick Macklem , > FreeBSD Filesystems > Subject: Re: RFC: patch to make d_fileno 64bits > >> Do we have a way of versioning libc so that we can have the old version >> that provides the 32-bit version of the syscalls (156 and 196) along >> with 32-bit higher-level functions like fts and friends and then a new >> libc version that has the 64-bit version of the syscalls and other >> higher-level functions? > > We do not need several versions of libc. We support symbol versioning, > i.e. we can have old getdirents symbol which resolves to syscall stub > for 196, and new getdirents for new syscall. > > It is somewhat convoluted feature, you could look at example in > sys/kern/sysv_*.c, for instance, freebsd7_shmctl and shmctl. Also > look at libc/include/compat.h. For pure usermode compat shims, > lib/libc/gen/fts-compat.c was already handled one time. > > I promise to write neccessary magic for libc versioning when needed. > As I explained before, unfortunately the libc is not the final point > for the userspace compat. What more beyond libc do we need to handle? Things like fts is in libc. Are there other libraries besides libc that embed the size of an ino_t? Kirk From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 18:44:11 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 62B41A0 for ; Sat, 22 Nov 2014 18:44:11 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E471B390 for ; Sat, 22 Nov 2014 18:44:10 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sAMIi5Mg009645 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Nov 2014 20:44:05 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sAMIi5Mg009645 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sAMIi5lO009644; Sat, 22 Nov 2014 20:44:05 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 22 Nov 2014 20:44:05 +0200 From: Konstantin Belousov To: Kirk McKusick Subject: Re: RFC: patch to make d_fileno 64bits Message-ID: <20141122184405.GC17068@kib.kiev.ua> References: <20141122175531.GZ17068@kib.kiev.ua> <201411221812.sAMICsv8016811@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201411221812.sAMICsv8016811@chez.mckusick.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 18:44:11 -0000 On Sat, Nov 22, 2014 at 10:12:54AM -0800, Kirk McKusick wrote: > > Date: Sat, 22 Nov 2014 19:55:31 +0200 > > From: Konstantin Belousov > > To: Kirk McKusick > > Cc: Rick Macklem , > > FreeBSD Filesystems > > Subject: Re: RFC: patch to make d_fileno 64bits > > > >> Do we have a way of versioning libc so that we can have the old version > >> that provides the 32-bit version of the syscalls (156 and 196) along > >> with 32-bit higher-level functions like fts and friends and then a new > >> libc version that has the 64-bit version of the syscalls and other > >> higher-level functions? > > > > We do not need several versions of libc. We support symbol versioning, > > i.e. we can have old getdirents symbol which resolves to syscall stub > > for 196, and new getdirents for new syscall. > > > > It is somewhat convoluted feature, you could look at example in > > sys/kern/sysv_*.c, for instance, freebsd7_shmctl and shmctl. Also > > look at libc/include/compat.h. For pure usermode compat shims, > > lib/libc/gen/fts-compat.c was already handled one time. > > > > I promise to write neccessary magic for libc versioning when needed. > > As I explained before, unfortunately the libc is not the final point > > for the userspace compat. > > What more beyond libc do we need to handle? Things like fts is in libc. > Are there other libraries besides libc that embed the size of an ino_t? Answering the question is part of the project. Scientific method to look at this is to do two builds of the world with the debugging information enabled, one stock, and one with the ino_t changed to 64bit. Then, tools like abi-compliance-checker or some set of scripts we have somewhere in tree (I do not remember exactly), can compare the dwarf definitions of the structures. This will provide the definitive answer to the question. General feeling is that a lot should be affected, since struct stat includes inode number. Quick look at the sources picks up references to ino_t in libarchive, libprocstat, libufs external interfaces. From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 21:11:52 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC979CC for ; Sat, 22 Nov 2014 21:11:52 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6E9C7604 for ; Sat, 22 Nov 2014 21:11:52 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id hi2so2366667wib.11 for ; Sat, 22 Nov 2014 13:11:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=y+Y6jTMhmeQmnqG5r0KB3py/vAmGGcPeUYi5syCDFxg=; b=MIB/qx9X0HuTGhMHsMaLJwGv0QqTAJLI52DxJPXOMuqMxhVSqO8LDU4qbcB9qtBA6o C7eZO4dvCY0cq2TOxa3hDQRXOSFbP8Xb0PyLip3pY3KRzER16B8y+JDCVqpwY46luygO zI0Px/t8lwRHtuatYAV/RLnUT3CwTZyr+87wQ6qAJ+CXlqoflNtkGrr8FPtwUoUEFPsr sbwYKO7iQzoN1B00RjQLnlA+Cjknnjxz78AAR7bDuosDeaPhCWr9K+FinxayGsX+pf5w buS62J49A/TZehJpnYu8R6QRfoG3U2VUxINRQSP7TUPWHGSRppIJDdemG0Z0OJlxxlT5 F3Yw== X-Received: by 10.194.85.83 with SMTP id f19mr20605666wjz.20.1416690710836; Sat, 22 Nov 2014 13:11:50 -0800 (PST) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id he3sm13651054wjc.15.2014.11.22.13.11.49 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 22 Nov 2014 13:11:50 -0800 (PST) Date: Sat, 22 Nov 2014 22:11:47 +0100 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20141122211147.GA23623@dft-labs.eu> References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141122092527.GT17068@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 21:11:53 -0000 On Sat, Nov 22, 2014 at 11:25:27AM +0200, Konstantin Belousov wrote: > On Sat, Nov 22, 2014 at 01:28:12AM +0100, Mateusz Guzik wrote: > > The idea is that we don't need an interlock as long as we don't > > transition either counter 1->0 or 0->1. > I already said that something along the lines of the patch should work. > In fact, you need vnode lock when hold count changes between 0 and 1, > and probably the same for use count. > I don't see why this would be required (not that I'm an VFS expert). vnode recycling seems to be protected with the interlock. In fact I would argue that if this is really needed, current code is buggy. interlock is taken in e.g. vgone with vnode already locked, so for cases where we get interlock -> lock, the kernel has to drop the former before blocking in order to avoid deadlocks. And this opens the same window present with my patch. > Some notes about the patch. > > mtx_owned() braces are untolerable ugliness. You should either pass a > boolean flag (preferred), or create locked/unlocked versions of the > functions. > That was a temporary hack. > Similarly, I dislike vget_held(). Add a flag to vget(), see LK_EATTR_MASK > in sys/lockmgr.h. > lockmgr has no business knowing or not knowing whether we held the vnode, so the flag would have to be cleared before it is passed to it. But then it seems like an abuse of LK_* namespace. But maybe I misunerstood your proposal. > Could there be consequences of not taking vnode interlock and passing > LK_INTERLOCK to vn_lock() in vget() ? > You mean to add an assertion? I did it in the new patch. > Taking interlock when vnode lock is already owned is probably fine and > does not add to contention. I mean that making VI_OWEINACT so loose > breaks the VOP_INACTIVE() contract. namecache typically locks vnodes shared and in such cases vinactive is not executed and only OWEINACT is cleared. And it is a serialisation point. I just tested with multiple stats in the same directory and it went from ~1100000 to ~620000 ops/s. I can add unconditional locking of the interlock for exclusively locked vnodes if you insist, bur for shared ones I don't see any benefit. Patch below should be split in 3, but imho is sufficinetly readable for review in one batch for review. 1. It adds refcount_{acquire,release}_if_greater functions to replace open coded atomic_cmpset_int loops. 2. v_rdev->si_usecount manipulation is moved to v_{incr,decr}_devcount. 3. actual switch to atomics + some assertions diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c index 83f29c1..b587ebd 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/vnode.c @@ -99,6 +99,6 @@ vn_rele_async(vnode_t *vp, taskq_t *taskq) (task_func_t *)vn_rele_inactive, vp, TQ_SLEEP) != 0); return; } - vp->v_usecount--; + refcount_release(&vp->v_usecount); vdropl(vp); } diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index 55e3217..50a84d8 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -665,12 +665,12 @@ success: ltype = VOP_ISLOCKED(dvp); VOP_UNLOCK(dvp, 0); } - VI_LOCK(*vpp); + vhold(*vpp); if (wlocked) CACHE_WUNLOCK(); else CACHE_RUNLOCK(); - error = vget(*vpp, cnp->cn_lkflags | LK_INTERLOCK, cnp->cn_thread); + error = vget_held(*vpp, cnp->cn_lkflags, cnp->cn_thread); if (cnp->cn_flags & ISDOTDOT) { vn_lock(dvp, ltype | LK_RETRY); if (dvp->v_iflag & VI_DOOMED) { @@ -1376,9 +1376,9 @@ vn_dir_dd_ino(struct vnode *vp) if ((ncp->nc_flag & NCF_ISDOTDOT) != 0) continue; ddvp = ncp->nc_dvp; - VI_LOCK(ddvp); + vhold(ddvp); CACHE_RUNLOCK(); - if (vget(ddvp, LK_INTERLOCK | LK_SHARED | LK_NOWAIT, curthread)) + if (vget_held(ddvp, LK_SHARED | LK_NOWAIT, curthread)) return (NULL); return (ddvp); } diff --git a/sys/kern/vfs_hash.c b/sys/kern/vfs_hash.c index 0271e49..d2fdbba 100644 --- a/sys/kern/vfs_hash.c +++ b/sys/kern/vfs_hash.c @@ -83,9 +83,9 @@ vfs_hash_get(const struct mount *mp, u_int hash, int flags, struct thread *td, s continue; if (fn != NULL && fn(vp, arg)) continue; - VI_LOCK(vp); + vhold(vp); mtx_unlock(&vfs_hash_mtx); - error = vget(vp, flags | LK_INTERLOCK, td); + error = vget_held(vp, flags, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; if (error) @@ -127,9 +127,9 @@ vfs_hash_insert(struct vnode *vp, u_int hash, int flags, struct thread *td, stru continue; if (fn != NULL && fn(vp2, arg)) continue; - VI_LOCK(vp2); + vhold(vp2); mtx_unlock(&vfs_hash_mtx); - error = vget(vp2, flags | LK_INTERLOCK, td); + error = vget_held(vp2, flags, td); if (error == ENOENT && (flags & LK_NOWAIT) == 0) break; mtx_lock(&vfs_hash_mtx); diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 345aad6..564b5af 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -105,6 +106,8 @@ static void v_incr_usecount(struct vnode *); static void v_decr_usecount(struct vnode *); static void v_decr_useonly(struct vnode *); static void v_upgrade_usecount(struct vnode *); +static void v_incr_devcount(struct vnode *); +static void v_decr_devcount(struct vnode *); static void vnlru_free(int); static void vgonel(struct vnode *); static void vfs_knllock(void *arg); @@ -165,6 +168,10 @@ static int reassignbufcalls; SYSCTL_INT(_vfs, OID_AUTO, reassignbufcalls, CTLFLAG_RW, &reassignbufcalls, 0, "Number of calls to reassignbuf"); +static int vget_lock; +SYSCTL_INT(_vfs, OID_AUTO, vget_lock, CTLFLAG_RW, &vget_lock, 0, + "Lock the interlock unconditionally in vget"); + /* * Cache for the mount type id assigned to NFS. This is used for * special checks in nfs/nfs_nqlease.c and vm/vnode_pager.c. @@ -854,7 +861,7 @@ vnlru_free(int count) */ freevnodes--; vp->v_iflag &= ~VI_FREE; - vp->v_holdcnt++; + refcount_acquire(&vp->v_holdcnt); mtx_unlock(&vnode_free_list_mtx); VI_UNLOCK(vp); @@ -2050,14 +2057,10 @@ static void v_incr_usecount(struct vnode *vp) { + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vholdl(vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); - } + vhold(vp); + v_upgrade_usecount(vp); } /* @@ -2068,13 +2071,14 @@ static void v_upgrade_usecount(struct vnode *vp) { + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount++; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount++; - dev_unlock(); + if (!refcount_acquire_if_greater(&vp->v_usecount, 0)) { + VI_LOCK(vp); + refcount_acquire(&vp->v_usecount); + VI_UNLOCK(vp); } + v_incr_devcount(vp); } /* @@ -2086,16 +2090,11 @@ static void v_decr_usecount(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); + ASSERT_VI_LOCKED(vp, __func__); VNASSERT(vp->v_usecount > 0, vp, ("v_decr_usecount: negative usecount")); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; - if (vp->v_type == VCHR && vp->v_rdev != NULL) { - dev_lock(); - vp->v_rdev->si_usecount--; - dev_unlock(); - } + v_decr_useonly(vp); vdropl(vp); } @@ -2109,11 +2108,35 @@ static void v_decr_useonly(struct vnode *vp) { - ASSERT_VI_LOCKED(vp, __FUNCTION__); + ASSERT_VI_LOCKED(vp, __func__); VNASSERT(vp->v_usecount > 0, vp, ("v_decr_useonly: negative usecount")); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - vp->v_usecount--; + refcount_release(&vp->v_usecount); + v_decr_devcount(vp); +} + +/* + * Increment si_usecount of the associated device, if any. + */ +static void +v_incr_devcount(struct vnode *vp) +{ + + if (vp->v_type == VCHR && vp->v_rdev != NULL) { + dev_lock(); + vp->v_rdev->si_usecount++; + dev_unlock(); + } +} + +/* + * Increment si_usecount of the associated device, if any. + */ +static void +v_decr_devcount(struct vnode *vp) +{ + if (vp->v_type == VCHR && vp->v_rdev != NULL) { dev_lock(); vp->v_rdev->si_usecount--; @@ -2129,19 +2152,19 @@ v_decr_useonly(struct vnode *vp) * vput try to do it here. */ int -vget(struct vnode *vp, int flags, struct thread *td) +vget_held(struct vnode *vp, int flags, struct thread *td) { int error; - error = 0; VNASSERT((flags & LK_TYPE_MASK) != 0, vp, ("vget: invalid lock operation")); + if ((flags & LK_INTERLOCK) != 0) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR3(KTR_VFS, "%s: vp %p with flags %d", __func__, vp, flags); - if ((flags & LK_INTERLOCK) == 0) - VI_LOCK(vp); - vholdl(vp); - if ((error = vn_lock(vp, flags | LK_INTERLOCK)) != 0) { + if ((error = vn_lock(vp, flags)) != 0) { vdrop(vp); CTR2(KTR_VFS, "%s: impossible to lock vnode %p", __func__, vp); @@ -2149,7 +2172,6 @@ vget(struct vnode *vp, int flags, struct thread *td) } if (vp->v_iflag & VI_DOOMED && (flags & LK_RETRY) == 0) panic("vget: vn_lock failed to return ENOENT\n"); - VI_LOCK(vp); /* Upgrade our holdcnt to a usecount. */ v_upgrade_usecount(vp); /* @@ -2158,16 +2180,27 @@ vget(struct vnode *vp, int flags, struct thread *td) * here at preventing a reference to a removed file. If * we don't succeed no harm is done. */ - if (vp->v_iflag & VI_OWEINACT) { - if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && - (flags & LK_NOWAIT) == 0) - vinactive(vp, td); - vp->v_iflag &= ~VI_OWEINACT; + if (vget_lock || vp->v_iflag & VI_OWEINACT) { + VI_LOCK(vp); + if (vp->v_iflag & VI_OWEINACT) { + if (VOP_ISLOCKED(vp) == LK_EXCLUSIVE && + (flags & LK_NOWAIT) == 0) + vinactive(vp, td); + vp->v_iflag &= ~VI_OWEINACT; + } + VI_UNLOCK(vp); } - VI_UNLOCK(vp); return (0); } +int +vget(struct vnode *vp, int flags, struct thread *td) +{ + + _vhold(vp, (flags & LK_INTERLOCK) != 0); + return (vget_held(vp, flags, td)); +} + /* * Increase the reference count of a vnode. */ @@ -2176,9 +2209,7 @@ vref(struct vnode *vp) { CTR2(KTR_VFS, "%s: vp %p", __func__, vp); - VI_LOCK(vp); v_incr_usecount(vp); - VI_UNLOCK(vp); } /* @@ -2193,13 +2224,8 @@ vref(struct vnode *vp) int vrefcnt(struct vnode *vp) { - int usecnt; - - VI_LOCK(vp); - usecnt = vp->v_usecount; - VI_UNLOCK(vp); - return (usecnt); + return (vp->v_usecount); } #define VPUTX_VRELE 1 @@ -2218,12 +2244,19 @@ vputx(struct vnode *vp, int func) ASSERT_VOP_LOCKED(vp, "vput"); else KASSERT(func == VPUTX_VRELE, ("vputx: wrong func")); + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); + + if (refcount_release_if_greater(&vp->v_usecount, 1)) { + if (func == VPUTX_VPUT) + VOP_UNLOCK(vp, 0); + v_decr_devcount(vp); + vdrop(vp); + return; + } + VI_LOCK(vp); - /* Skip this v_writecount check if we're going to panic below. */ - VNASSERT(vp->v_writecount < vp->v_usecount || vp->v_usecount < 1, vp, - ("vputx: missed vn_close")); error = 0; if (vp->v_usecount > 1 || ((vp->v_iflag & VI_DOINGINACT) && @@ -2314,38 +2347,32 @@ vunref(struct vnode *vp) } /* - * Somebody doesn't want the vnode recycled. - */ -void -vhold(struct vnode *vp) -{ - - VI_LOCK(vp); - vholdl(vp); - VI_UNLOCK(vp); -} - -/* * Increase the hold count and activate if this is the first reference. */ void -vholdl(struct vnode *vp) +_vhold(struct vnode *vp, bool locked) { struct mount *mp; + if (locked) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); -#ifdef INVARIANTS - /* getnewvnode() calls v_incr_usecount() without holding interlock. */ - if (vp->v_type != VNON || vp->v_data != NULL) { - ASSERT_VI_LOCKED(vp, "vholdl"); - VNASSERT(vp->v_holdcnt > 0 || (vp->v_iflag & VI_FREE) != 0, - vp, ("vholdl: free vnode is held")); + if (refcount_acquire_if_greater(&vp->v_holdcnt, 0)) { + VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, + ("_vhold: vnode with holdcnt is free")); + return; } -#endif - vp->v_holdcnt++; - if ((vp->v_iflag & VI_FREE) == 0) + if (!locked) + VI_LOCK(vp); + if ((vp->v_iflag & VI_FREE) == 0) { + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); return; - VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count")); + } + VNASSERT(vp->v_holdcnt == 0, vp, ("vholdl: wrong hold count")); VNASSERT(vp->v_op != NULL, vp, ("vholdl: vnode already reclaimed.")); /* * Remove a vnode from the free list, mark it as in use, @@ -2362,18 +2389,9 @@ vholdl(struct vnode *vp) TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); mp->mnt_activevnodelistsize++; mtx_unlock(&vnode_free_list_mtx); -} - -/* - * Note that there is one less who cares about this vnode. - * vdrop() is the opposite of vhold(). - */ -void -vdrop(struct vnode *vp) -{ - - VI_LOCK(vp); - vdropl(vp); + refcount_acquire(&vp->v_holdcnt); + if (!locked) + VI_UNLOCK(vp); } /* @@ -2382,20 +2400,28 @@ vdrop(struct vnode *vp) * (marked VI_DOOMED) in which case we will free it. */ void -vdropl(struct vnode *vp) +_vdrop(struct vnode *vp, bool locked) { struct bufobj *bo; struct mount *mp; int active; - ASSERT_VI_LOCKED(vp, "vdropl"); + if (locked) + ASSERT_VI_LOCKED(vp, __func__); + else + ASSERT_VI_UNLOCKED(vp, __func__); CTR2(KTR_VFS, "%s: vp %p", __func__, vp); if (vp->v_holdcnt <= 0) panic("vdrop: holdcnt %d", vp->v_holdcnt); - vp->v_holdcnt--; - VNASSERT(vp->v_holdcnt >= vp->v_usecount, vp, - ("hold count less than use count")); - if (vp->v_holdcnt > 0) { + if (refcount_release_if_greater(&vp->v_holdcnt, 1)) { + if (locked) + VI_UNLOCK(vp); + return; + } + + if (!locked) + VI_LOCK(vp); + if (refcount_release(&vp->v_holdcnt) == 0) { VI_UNLOCK(vp); return; } diff --git a/sys/sys/refcount.h b/sys/sys/refcount.h index 4611664..360d50d 100644 --- a/sys/sys/refcount.h +++ b/sys/sys/refcount.h @@ -64,4 +64,32 @@ refcount_release(volatile u_int *count) return (old == 1); } +static __inline int +refcount_acquire_if_greater(volatile u_int *count, int val) +{ + int old; +retry: + old = *count; + if (old > val) { + if (atomic_cmpset_int(count, old, old + 1)) + return (true); + goto retry; + } + return (false); +} + +static __inline int +refcount_release_if_greater(volatile u_int *count, int val) +{ + int old; +retry: + old = *count; + if (old > val) { + if (atomic_cmpset_int(count, old, old - 1)) + return (true); + goto retry; + } + return (false); +} + #endif /* ! __SYS_REFCOUNT_H__ */ diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index c78b9d1..2aab48a 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -647,13 +647,16 @@ int vaccess_acl_posix1e(enum vtype type, uid_t file_uid, struct ucred *cred, int *privused); void vattr_null(struct vattr *vap); int vcount(struct vnode *vp); -void vdrop(struct vnode *); -void vdropl(struct vnode *); +#define vdrop(vp) _vdrop((vp), 0) +#define vdropl(vp) _vdrop((vp), 1) +void _vdrop(struct vnode *, bool); int vflush(struct mount *mp, int rootrefs, int flags, struct thread *td); int vget(struct vnode *vp, int lockflag, struct thread *td); +int vget_held(struct vnode *vp, int lockflag, struct thread *td); void vgone(struct vnode *vp); -void vhold(struct vnode *); -void vholdl(struct vnode *); +#define vhold(vp) _vhold((vp), 0) +#define vholdl(vp) _vhold((vp), 1) +void _vhold(struct vnode *, bool); void vinactive(struct vnode *, struct thread *); int vinvalbuf(struct vnode *vp, int save, int slpflag, int slptimeo); int vtruncbuf(struct vnode *vp, struct ucred *cred, off_t length, From owner-freebsd-fs@FreeBSD.ORG Sat Nov 22 23:56:30 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 59C4A3EA for ; Sat, 22 Nov 2014 23:56:30 +0000 (UTC) Received: from mail-oi0-x233.google.com (mail-oi0-x233.google.com [IPv6:2607:f8b0:4003:c06::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 27D61618 for ; Sat, 22 Nov 2014 23:56:29 +0000 (UTC) Received: by mail-oi0-f51.google.com with SMTP id e131so5256211oig.38 for ; Sat, 22 Nov 2014 15:56:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=sh/MGhZkjysTRVaFEupVVNOwfJDE7hXdBVghmrV/mUk=; b=QByPRhx0s+KFGBkfmLnlf+J7hTDa7iydM4/AXFkD9hDozI2XjSCdA7ZRYp0i/JPoNQ 9yhEBC5OMuNa8WNvv7Xi7msXPK+Rb9bFSZgflYIZ6jtZxiDQTBitZdlBGok0AcPrsGZQ JBwMB1/jDu59yM5yb3uCbkvKtqIPdX7EAYi2r4zSe8CzNcC4KPKRfIobXDRb+iRE0k3k nccOi3ogtRsTZqRnci4HTNvrsVY/lzvXlIyvVsbYpK9g/lZZOxk/02fXF8h1+8MKXfzg e+6nDRGZDSD6ZYvX7N8BeZHJjwmsMZ/9qq/aaekk3gBhV9160lrVCFROuj3dw4CYkfr8 PTYQ== MIME-Version: 1.0 X-Received: by 10.182.79.10 with SMTP id f10mr7951700obx.4.1416700589265; Sat, 22 Nov 2014 15:56:29 -0800 (PST) Received: by 10.76.0.138 with HTTP; Sat, 22 Nov 2014 15:56:29 -0800 (PST) Date: Sat, 22 Nov 2014 18:56:29 -0500 Message-ID: Subject: When a ZFS error is not an error. From: Zaphod Beeblebrox To: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Nov 2014 23:56:30 -0000 I have a file that ZFS claims is in error that when I go through all the effort to retrieve it, is not in error. I have 405 files, then, that zfs says are in error on this array and since some are rather large and since retrieving one block seems to take 30 seconds (ie: hundreds of hours of time to recover some files), I'd like to ask if there's some way to finesse this... or to fix zfs. To start, my array has errors like: NAME STATE READ WRITE CKSUM vr2 ONLINE 0 0 989 raidz1-0 ONLINE 0 0 1.93K label/vr2-d0 ONLINE 0 0 0 (I've omitted the other lines ... they all '0'). I asked what this meant ... and the best I got was that the errors were not assigned to any particular device. So I learned how to use ZDB and I have a patch for ZDB. Apparently the deadlist can have a null in it that crashes ZDB. No matter. We have this file in the output of zpool status -v: vr2/Audio@20080305-1450:/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 ... now even though it picks on the snapshot (not all of the -v reports do), the following fails: [1:170:470]root@virtual:/vr1/tmp/diag> cp /vr2/Audio/cds/service/02-Lord_Have_Mercy_Kyrie.mp3 . cp: foo.mp3: Bad address So I did this: for i in `grep L0 4351-dddddddd.txt | grep -v vr2/Audio | head -50 | cut -c22-34`; do cc=`printf %05d $count`; echo getting $i 4035/b$cc; time zdb -R vr2 $i:20000:r >4035/b$cc & count=$[count+1]; done --- basically, 4351-dddddddd.txt is the output of zdb for that file (see http://pastebin.com/tdqEJKJB) and the little script calls zdb to get the first 20000 (hex) of each block because the remaining 4000 is the parity (9 disk array). Then I cat it into one file, then I truncate it to the specified length .... and lo and behold: The file is sound. So what's ZFS on about not wanting to read this file? Help?