From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 06:51:39 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00FCB106564A for ; Sun, 28 Jun 2009 06:51:39 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-bw0-f210.google.com (mail-bw0-f210.google.com [209.85.218.210]) by mx1.freebsd.org (Postfix) with ESMTP id 75E1B8FC08 for ; Sun, 28 Jun 2009 06:51:38 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by bwz6 with SMTP id 6so263951bwz.43 for ; Sat, 27 Jun 2009 23:51:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=oHhzMtJpintARzjmy7aLnXtP3LLfxU5KxKFUJtHWTnM=; b=PmZJUKwERKHY51+z71nGTGHKRZ2i6RUCIFG+W/SmieYpY+17gY7hXQsYtM5xbKU/x9 KDP9wNioBSbBAs/yAR+7NkP8z3sibre0mkJquGKPR4ZplTKlEOigSoHViIfBa8HRXHyJ lTLibmIox6Gn/ijwcIqXOge+M3JEGmnlyO/6c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=omHRjmOMnA4Nb77mIZY9S96mR+44dTsZF7MhHtUHelpg7HYiArzbZgtb7asCY6sHXT aMHD4jl1bAEMxjNTXxeQlj5Za8LJssqAVwO8frkT3PEmT+9Ji9T8i51XiEO5QIK4Apxi pr9Cah7aVSvClTR/5u81PjG5J/YL/xmP4llBw= MIME-Version: 1.0 Received: by 10.204.121.131 with SMTP id h3mr5660322bkr.66.1246169929500; Sat, 27 Jun 2009 23:18:49 -0700 (PDT) In-Reply-To: References: <380571.31027.qm@web51002.mail.re2.yahoo.com> Date: Sun, 28 Jun 2009 02:18:49 -0400 Message-ID: <5f67a8c40906272318t2f27822dg3e30f7dc2345cb11@mail.gmail.com> From: Zaphod Beeblebrox To: Dan Naumov Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Rafael Caesar Lenzi Subject: Re: Adding more disk's on ZFS array X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 06:51:39 -0000 On Sat, Jun 27, 2009 at 6:27 PM, Dan Naumov wrote: > On Sun, Jun 28, 2009 at 12:58 AM, Rafael Caesar > Lenzi wrote: > > > > Hi! > > How i can add more disks on ZFS raid0 or raid5 array? > > > > Thanks! > > Rafael Lenzi > > raid0 (stripe): zpool add POOLNAME DEVICENAME > raid5 (raidz): you can't > Not entirely true. For a RAID 0 stripe, yes, you can just add a disk. Be clear, however, that existing data is not striped to that disk, but the new disk is used for new data. For RAID 5 (raidz), you have two options. You can replace each disk, in turn, with a larger disk and heal the array each time. I did this, for instance, to move from 5 750G drives to 5 1.5T drives. Another option is to add another bunch of RAID 5 drives. If you have 5 existing drives RAID 5, you can add another set of dries with zpool add. According to documentation, each pool should be of the same RAID type. It doesn't, however, specify that each set of RAID 5 disks should have the same number of disks in it. This seems to mean that you could add a set of 3 disks (raid 5) to an existing raid 5 array with 5 disks. From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 08:16:23 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 744D61065670 for ; Sun, 28 Jun 2009 08:16:23 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 333E58FC16 for ; Sun, 28 Jun 2009 08:16:23 +0000 (UTC) (envelope-from andrew@modulus.org) Received: by email.octopus.com.au (Postfix, from userid 1002) id 85A9E172D2; Sun, 28 Jun 2009 18:16:45 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: X-Spam-Status: No, score=-0.3 required=10.0 tests=ALL_TRUSTED,DNS_FROM_DOB, RCVD_IN_DOB autolearn=no version=3.2.3 Received: from [10.20.30.102] (60.218.233.220.static.exetel.com.au [220.233.218.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id 0DC3917258; Sun, 28 Jun 2009 18:16:41 +1000 (EST) Message-ID: <4A4725FA.80505@modulus.org> Date: Sun, 28 Jun 2009 18:12:42 +1000 From: Andrew Snow User-Agent: Thunderbird 2.0.0.6 (X11/20070926) MIME-Version: 1.0 To: Dan Naumov References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 08:16:23 -0000 > Contiguous Write Performance: > http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-contig-write.png What confuses me about these results is that the '5 disk' performance was barely higher than the 'single disk' performance. All figures are also lower than I get from a single modern SATA disk. My own testing with dd from /dev/zero with FreeBSD ZFS an Intel ICH10 chipset motherboard with Core2duo 2.66ghz showed RAIDZ performance scaling linearly with number of disks: What Write Read -------------------------------- 7 disk RAIDZ2 220 305 6 disk RAIDZ2 173 260 5 disk RAIDZ2 120 213 Only the on-board controllers were used, with Seagate disks of around 250GB capacity. System had 8GB RAM. These results are so different in absolute terms to your results that I don't know how to interpret your set. - Andrew From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 09:08:21 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0914106571C for ; Sun, 28 Jun 2009 09:08:21 +0000 (UTC) (envelope-from james-freebsd-fs2@jrv.org) Received: from mail.jrv.org (adsl-70-243-84-13.dsl.austtx.swbell.net [70.243.84.13]) by mx1.freebsd.org (Postfix) with ESMTP id 6CF3A8FC26 for ; Sun, 28 Jun 2009 09:08:21 +0000 (UTC) (envelope-from james-freebsd-fs2@jrv.org) Received: from kremvax.housenet.jrv (kremvax.housenet.jrv [192.168.3.124]) by mail.jrv.org (8.14.3/8.14.3) with ESMTP id n5S98KSQ060035; Sun, 28 Jun 2009 04:08:20 -0500 (CDT) (envelope-from james-freebsd-fs2@jrv.org) Authentication-Results: mail.jrv.org; domainkeys=pass (testing) header.from=james-freebsd-fs2@jrv.org DomainKey-Signature: a=rsa-sha1; s=enigma; d=jrv.org; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=UvxMyWyUfzZMs5BdQ7cchTdZv6XLE6p+rXT0SylSN1d9LE/fDfmy3za2iT44G13yg EPL8EWftZ1OcdXDezjXA9kD9l4iHzflGxr+Y/tZRPxCySc1Fg8OPLUOhUPBtzAobxZK fPgKMy0gkjiflm7amnSlFi82/uPAC7SK4IHl1ZE= Message-ID: <4A4732F0.3060802@jrv.org> Date: Sun, 28 Jun 2009 04:08:00 -0500 From: "James R. Van Artsdalen" User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: Zaphod Beeblebrox References: <380571.31027.qm@web51002.mail.re2.yahoo.com> <5f67a8c40906272318t2f27822dg3e30f7dc2345cb11@mail.gmail.com> In-Reply-To: <5f67a8c40906272318t2f27822dg3e30f7dc2345cb11@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs , Rafael Caesar Lenzi Subject: Re: Adding more disk's on ZFS array X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 09:08:22 -0000 Zaphod Beeblebrox wrote: > Not entirely true. For a RAID 0 stripe, yes, you can just add a disk. Be > clear, however, that existing data is not striped to that disk, but the new > disk is used for new data. > It's probably best not to think of a ZFS pool as a RAID 0 but instead as a set of vdev storage areas. All of the vdevs are candidates for new data writes, depending on free space, etc. > According to > documentation, each pool should be of the same RAID type. It doesn't, > however, specify that each set of RAID 5 disks should have the same number > of disks in it. This seems to mean that you could add a set of 3 disks > (raid 5) to an existing raid 5 array with 5 disks. > A pool is a set of vdevs, and different vdevs may be of a different type and have different characteristics. It is perfectly reasonable to create a pool with a single RAIDZ vdev and later add MIRROR vdevs, or any other kind of vdev. I prefer to use MIRRORs as the vdevs since it's easier to control exposure to various failure modes (power supply, enclosure, controller & disk firmware, etc). From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 10:30:27 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85E76106568D; Sun, 28 Jun 2009 10:30:27 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from mail-yx0-f181.google.com (mail-yx0-f181.google.com [209.85.210.181]) by mx1.freebsd.org (Postfix) with ESMTP id 2EC788FC24; Sun, 28 Jun 2009 10:30:27 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: by yxe11 with SMTP id 11so2764255yxe.3 for ; Sun, 28 Jun 2009 03:30:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=j6IbRWSGP4VxhJUPN5R97FXETiJgSZKKWcuKyQrn6Wc=; b=LvT/zQIOA6DH64RwpVctPUYYS9fuwSm8AcR6hrxQZtid9Gk6gn6mFDLTiW7LkKnyyx VWOuRuI6fCwNcMGxVWqSyNaW/yn+fIeULxPSCvIEoHXvZ1jO9rXsNyY3epiNm1ajBLTk yAbgTGHmdKlsH9VuQy1xMjxorYwWPMUDBXaCg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=aQM+5VBu+4VYZ8BQAAW+q9iHwD/0RDqwJZdWwCJiSMuThkbNDbCyMtIbndXBdIfcoH PbJ0GfOSfumdRn6vKJjMPbXCsNfMoRg8Eih4ut30Q4gW4//K5dFRX0AkulE72GwCloV+ OCsvyr14n59mlGzBkbR4Vfv616jbfZWdbJQx0= MIME-Version: 1.0 Received: by 10.100.46.18 with SMTP id t18mr7516635ant.54.1246185026686; Sun, 28 Jun 2009 03:30:26 -0700 (PDT) In-Reply-To: <4A4725FA.80505@modulus.org> References: <4A4725FA.80505@modulus.org> Date: Sun, 28 Jun 2009 13:30:26 +0300 Message-ID: From: Dan Naumov To: Andrew Snow Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 10:30:28 -0000 > What confuses me about these results is that the '5 disk' performance was > barely higher than the 'single disk' performance. =A0All figures are also > lower than I get from a single modern SATA disk. > > My own testing with dd from /dev/zero with FreeBSD ZFS an Intel ICH10 > chipset motherboard with Core2duo 2.66ghz showed RAIDZ performance scalin= g > linearly with number of disks: > > > What =A0 =A0 =A0 =A0 =A0 =A0 =A0 Write =A0 Read > -------------------------------- > 7 disk RAIDZ2 =A0 =A0 =A0220 =A0 =A0 305 > 6 disk RAIDZ2 =A0 =A0 =A0173 =A0 =A0 260 > 5 disk RAIDZ2 =A0 =A0 =A0120 =A0 =A0 213 What's confusing is that your results are actually out of place with how ZFS numbers are supposed to look, not mine :) When using ZFS RAIDZ, due to the way parity checking works in ZFS, your pool is SUPPOSED to have throughput of the average single disk from that pool and not some numbers growing skyhigh in a linear fashion. The numbers that did surprise me the most were actually gmirror reads (results posted earlier to this list): a geom gmirror is consistently SLOWER for reading that a single disk (and it only gets progressively worse the more disks you have in your gmirror). Read performance of all other mirroring implementations pretty much scale up linearly with the amount of disks present in the mirror. - Sincerely, Dan Naumov From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 10:39:55 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5A6B106566C for ; Sun, 28 Jun 2009 10:39:55 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id A46168FC0C for ; Sun, 28 Jun 2009 10:39:55 +0000 (UTC) (envelope-from andrew@modulus.org) Received: by email.octopus.com.au (Postfix, from userid 1002) id DEEF21733D; Sun, 28 Jun 2009 20:40:17 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: X-Spam-Status: No, score=-0.3 required=10.0 tests=ALL_TRUSTED,DNS_FROM_DOB, RCVD_IN_DOB autolearn=no version=3.2.3 Received: from [10.20.30.102] (60.218.233.220.static.exetel.com.au [220.233.218.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id 66C601721F; Sun, 28 Jun 2009 20:40:13 +1000 (EST) Message-ID: <4A4747A0.6040902@modulus.org> Date: Sun, 28 Jun 2009 20:36:16 +1000 From: Andrew Snow User-Agent: Thunderbird 2.0.0.6 (X11/20070926) MIME-Version: 1.0 To: Dan Naumov References: <4A4725FA.80505@modulus.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 10:39:56 -0000 > What's confusing is that your results are actually out of place with > how ZFS numbers are supposed to look, not mine :) When using ZFS > RAIDZ, due to the way parity checking works in ZFS, your pool is > SUPPOSED to have throughput of the average single disk from that pool > and not some numbers growing skyhigh in a linear fashion. Could you please elaborate on this and explain it? - Andrew From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 11:02:04 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54B5F106564A; Sun, 28 Jun 2009 11:02:04 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.245]) by mx1.freebsd.org (Postfix) with ESMTP id E79F38FC08; Sun, 28 Jun 2009 11:02:03 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: by an-out-0708.google.com with SMTP id d14so933252and.13 for ; Sun, 28 Jun 2009 04:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=+YkaB5f87hEajaJGUm2wtGXumnLWaCI1qwMrNc3VXcA=; b=ejs+AZ276WHeCBZ1UOnLYMAvEoAaPK7x4MgVmARF3je7HteCDeQsiDfijg8Z2giShY GptBwHQAfQilzjS7m9B6m2h53dEZXa7wWWozcdVZdacJb5NO95iPWCk2LglpUIYMYAs9 DrCWs0PFOFuztf5PEAJvoucGobv6UPQXiqZuQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=wwxu82stT2ouvAjW5jf6ElNZiSSc+r2xKBCqTFOhwPbfBjfZRlEMiLPHA0vcjL3JxT F9paT4kddeJjH7yK4u9/kvc1qC6xILB4yL1lNUQEHYGcy9+vDqJKK1rAWcwtr5aAGmdN j3ajhaN6e1gaPP10zEe9PBDhNaUPdMObsf6qM= MIME-Version: 1.0 Received: by 10.100.11.14 with SMTP id 14mr7540531ank.81.1246186923267; Sun, 28 Jun 2009 04:02:03 -0700 (PDT) In-Reply-To: <4A4747A0.6040902@modulus.org> References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> Date: Sun, 28 Jun 2009 14:02:03 +0300 Message-ID: From: Dan Naumov To: Andrew Snow Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 11:02:04 -0000 "Now we come to the crucial decision ZFS has made for raidz and raidz2: in raidz and raidz2, the data block is striped across all of the disks. Instead of a model where a parity stripe is a bunch of data blocks, each with an independent checksum, ZFS stripes a single data block (and its parity), with a single checksum, across all the disks (or as many of them as necessary). This is a rational implementation decision, but when combined with the need to verify checksums, it has an important consequence: in ZFS, reads always involve all disks, because ZFS always must verify the data block's checksum, which requires reading all of the data block, which is spread across all of the drives. This is unlike normal RAID-5 or RAID-6, in which a small enough read will only touch one drive, and means that adding more disks to a ZFS raidz pool does not increase how many random reads you can do per second. (A normal RAID-5 or RAID-6 array has a (theoretical) random read IO capacity equal to the sum of the random IO operations rate of each of the disks in the array, and so adding another disk adds its IOPs per second to your read capacity. A ZFS raidz or raidz2 pool instead has a capacity equal to the slowest disk's IOPs per second, and adding another disk does nothing to help. Effectively a raidz ZFS gives you a single disk's read IOPs per second rate.)" This was on a blog of a SUN engineer (although a post from a few years ago), unfortunately I don't have the link, I actually had to go through my posting history on the Ars Technica forum to even find this quote in the first place. If the situation has changed and the above quote no longer holds true, it would be nice if someone more knowledgeable on the performance implications could elaborate what kind of performance is to be expected on a raidz system :) - Sincerely, Dan Naumov On Sun, Jun 28, 2009 at 1:36 PM, Andrew Snow wrote: >> What's confusing is that your results are actually out of place with >> how ZFS numbers are supposed to look, not mine :) When using ZFS >> RAIDZ, due to the way parity checking works in ZFS, your pool is >> SUPPOSED to have throughput of the average single disk from that pool >> and not some numbers growing skyhigh in a linear fashion. > > Could you please elaborate on this and explain it? > > - Andrew > From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 11:37:15 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AC7F1065676 for ; Sun, 28 Jun 2009 11:37:15 +0000 (UTC) (envelope-from andrew@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 5BF838FC2E for ; Sun, 28 Jun 2009 11:37:14 +0000 (UTC) (envelope-from andrew@modulus.org) Received: by email.octopus.com.au (Postfix, from userid 1002) id EE167172FB; Sun, 28 Jun 2009 21:37:37 +1000 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: X-Spam-Status: No, score=-0.3 required=10.0 tests=ALL_TRUSTED,DNS_FROM_DOB, RCVD_IN_DOB autolearn=no version=3.2.3 Received: from [10.20.30.102] (60.218.233.220.static.exetel.com.au [220.233.218.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id AD78817255; Sun, 28 Jun 2009 21:37:33 +1000 (EST) Message-ID: <4A475511.5000700@modulus.org> Date: Sun, 28 Jun 2009 21:33:37 +1000 From: Andrew Snow User-Agent: Thunderbird 2.0.0.6 (X11/20070926) MIME-Version: 1.0 To: Dan Naumov References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 11:37:15 -0000 OK, I thought we were taling about a single-threaded sequential write which was what my benchmark is. It sounds like the graphs you published were of a multi-threaded writers - how many processes were running in parallel in the case of the "Contiguous Write Performance" here? http://virtual.tehinterweb.net/livejournal/2009-06-22_zfs_diskperf/zfs-diskperf-contig-write.png - Andrew From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 17:54:28 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 68B661065672 for ; Sun, 28 Jun 2009 17:54:28 +0000 (UTC) (envelope-from nhoyle@hoyletech.com) Received: from mout.perfora.net (mout.perfora.net [74.208.4.195]) by mx1.freebsd.org (Postfix) with ESMTP id 316B18FC13 for ; Sun, 28 Jun 2009 17:54:27 +0000 (UTC) (envelope-from nhoyle@hoyletech.com) Received: from [192.168.1.10] (pool-96-231-140-65.washdc.fios.verizon.net [96.231.140.65]) by mrelay.perfora.net (node=mrus1) with ESMTP (Nemesis) id 0MKpCa-1MKya40ser-000CC0; Sun, 28 Jun 2009 13:54:24 -0400 Message-ID: <4A47AE4A.6090705@hoyletech.com> Date: Sun, 28 Jun 2009 13:54:18 -0400 From: Nathanael Hoyle User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: Dan Naumov References: <4A4725FA.80505@modulus.org> <4A4747A0.6040902@modulus.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX18OqrS/n6+DW0YwrUV34ZJjxLP9sIBs3HnnEv8 gIEF28/Woti+JS0pKezxF3OwoVFap6dSTcGyhNAIX1Q5YHtgkx vmtrZtm3SBSbt/dx9qDMpgOyi0UzgZQ Cc: freebsd-fs@freebsd.org Subject: Re: read/write benchmarking: UFS2 vs ZFS vs EXT3 vs ZFS RAIDZ vs Linux MDRAID X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 17:54:28 -0000 The clear distinction between the two sets of performance tests you two have done is that Dan's are highly random short i/o's, and Andrew's are large sequential transfers. Large sequential transfers necessarily engage all of the disks in the pool, regardless of the parity strategy, therefore the implied penalty for ZFS to read the parity data from all drives is mostly theoretical, and actually performs more like RAID 5 typically would. In the case of Dan's highly random, short i/o's, the read itself is trivial, making the overhead of spinning/seeking all the disks to calculate the full checksum and validate it inordinately high. The implication of these two benchmarks is clear as well: ZFS RAIDZ may be an excellent choice for large storage capacity with reasonable performance characteristics for large sequential workloads, but should be avoided where many small transfers will be occurring. -Nathanael From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 20:14:30 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F849106566C; Sun, 28 Jun 2009 20:14:30 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 2DC5F8FC1E; Sun, 28 Jun 2009 20:14:30 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:43435 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1ML0lX-0006VX-5y; Sun, 28 Jun 2009 22:14:21 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 6A20F61C15; Sun, 28 Jun 2009 22:14:19 +0200 (CEST) Message-Id: <09277772-9C54-4AE6-A147-CB6A4ED38C48@exscape.org> From: Thomas Backman To: FreeBSD current In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Sun, 28 Jun 2009 22:14:16 +0200 References: <08D1E6DF-89D3-4887-9234-C3DB9164D794@exscape.org> <20090514133017.362075dhcdy7o2bs@webmail.leidinger.net> <7CD27FF0-CBFA-48B7-9E18-763D8C3ED9B8@exscape.org> <4A0C9B0C.4050403@jrv.org> X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1ML0lX-0006VX-5y. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1ML0lX-0006VX-5y 208144b44b36969805de88ad507ee025 Cc: freebsd-fs@freebsd.org Subject: Re: zfs send -R segfault, anyone else? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 20:14:31 -0000 On May 15, 2009, at 11:30 AM, Thomas Backman wrote: > > On May 15, 2009, at 12:28 AM, James R. Van Artsdalen wrote: > >> Thomas Backman wrote: >>> [root@chaos ~]# zfs send -R -I $OLD tank@$NOW > diff-snap >>> [root@chaos ~]# cat diff-snap | zfs recv -Fvd slave >>> Segmentation fault: 11 (core dumped) >>> >>> Same kinda backtrace, but what's up with strcmp()? >>> I suppose the issue stems from libzfs, and is not within libc: >> >> Different problem The SIGSEGV is happening in strcmp because it is >> called with strcmp(0,0) >> and tries to dereference address -4 (probably another bug itself). >> >> This hack gets around the issue but someone familiar with this >> needs to >> decide the correct action. >> >> The first change is actually unrelated (a sorry attempt at fixing the >> previous zfs send bug). >> >> The last change may be unnecessary as that case may never happen >> unless >> the pool can be renamed? >> >> [... patch ...] > > Thanks! This list is pretty impressive. :) > I can't validate how correct the fix is, considering my lacking > knowledge in C (I know the basics, but kernel/related programming? > no way!), but I CAN say that it appears to work just fine! > > Regards, > Thomas > Any news on this? The bug's been around for a long time, and a fix has been around for at least 1.5 months now, and AFAIK the bug still lives. The patch, again (I can't vouch for its correctness, but I can certainly say that it works just fine *for me*) follows. Regards, Thomas Index: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c =================================================================== --- cddl/contrib/opensolaris/lib/libzfs/common/ libzfs_sendrecv.c (revision 194851) +++ cddl/contrib/opensolaris/lib/libzfs/common/ libzfs_sendrecv.c (working copy) @@ -239,6 +239,8 @@ char *propname = nvpair_name(elem); zfs_prop_t prop = zfs_name_to_prop(propname); nvlist_t *propnv; + if (prop == ZPROP_INVAL) + continue; if (!zfs_prop_user(propname) && zfs_prop_readonly(prop)) continue; @@ -1126,7 +1128,7 @@ uint64_t originguid = 0; uint64_t stream_originguid = 0; uint64_t parent_fromsnap_guid, stream_parent_fromsnap_guid; - char *fsname, *stream_fsname; + char *fsname, *stream_fsname, *p1, *p2; nextfselem = nvlist_next_nvpair(local_nv, fselem); @@ -1295,10 +1297,13 @@ "parentfromsnap", &stream_parent_fromsnap_guid)); /* check for rename */ + p1 = strrchr(fsname, '/'); + p2 = strrchr(stream_fsname, '/'); + if ((stream_parent_fromsnap_guid != 0 && stream_parent_fromsnap_guid != parent_fromsnap_guid) || - strcmp(strrchr(fsname, '/'), - strrchr(stream_fsname, '/')) != 0) { + (p1 != NULL && p2 != NULL && strcmp (p1, p2) != 0) || + ((p1 == NULL) ^ (p2 == NULL))) { nvlist_t *parent; char tryname[ZFS_MAXNAMELEN]; @@ -1317,7 +1322,7 @@ VERIFY(0 == nvlist_lookup_string(parent, "name", &pname)); (void) snprintf(tryname, sizeof (tryname), - "%s%s", pname, strrchr(stream_fsname, '/')); + "%s%s", pname, p2 ? p2 : ""); } else { tryname[0] = '\0'; if (flags.verbose) { From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 20:41:45 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0168210656D3; Sun, 28 Jun 2009 20:41:45 +0000 (UTC) (envelope-from mat.macy@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.244]) by mx1.freebsd.org (Postfix) with ESMTP id 944EA8FC1B; Sun, 28 Jun 2009 20:41:44 +0000 (UTC) (envelope-from mat.macy@gmail.com) Received: by an-out-0708.google.com with SMTP id d14so1025414and.13 for ; Sun, 28 Jun 2009 13:41:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=1OnplT3p5OCapLXx/SxCJPs3LkFMoDKbfg8GXZzX3kg=; b=T0aA1XuAqQjTrNbxsosHb0xphryWFCKPbj6/1DHqSea4ZL4F2YxOzMf6XkacErAbbl SDhIXBvxgMWTADvXUHi+VZjGuw6Ar54N/Uy1+/QoK/9ka5uIL3Br4dhwIGjL/Hb0TAVk q/3rTO34v85WnLcpjVwCXl3iKi9igTE30glJY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=VJF4SQzAy0C8XpHlCxOFrAnzZ1cs7rf9qB79LP08Os9SOAM3y86vDO25l6xOXhkmYm kbN1yY3THi4TESkL9aJDJXPmPVrxflpjTi5OBRwo4weOqp9j9aSKt2xoroK6QZ45l/Er UEif+1OEUKg0qqSV/LgJ4CIekOaVeqrGv4PPk= MIME-Version: 1.0 Sender: mat.macy@gmail.com Received: by 10.100.251.6 with SMTP id y6mr8115892anh.44.1246221703553; Sun, 28 Jun 2009 13:41:43 -0700 (PDT) In-Reply-To: <09277772-9C54-4AE6-A147-CB6A4ED38C48@exscape.org> References: <08D1E6DF-89D3-4887-9234-C3DB9164D794@exscape.org> <20090514133017.362075dhcdy7o2bs@webmail.leidinger.net> <7CD27FF0-CBFA-48B7-9E18-763D8C3ED9B8@exscape.org> <4A0C9B0C.4050403@jrv.org> <09277772-9C54-4AE6-A147-CB6A4ED38C48@exscape.org> Date: Sun, 28 Jun 2009 13:41:43 -0700 X-Google-Sender-Auth: 8e6ff7087de777d1 Message-ID: <3c1674c90906281341w4b235dd7y809e1b23978ad5c3@mail.gmail.com> From: Kip Macy To: Thomas Backman Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, FreeBSD current Subject: Re: zfs send -R segfault, anyone else? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 20:41:45 -0000 I'm a bit preoccupied at the moment. Keep reminding me ... -Kip On Sun, Jun 28, 2009 at 1:14 PM, Thomas Backman wrote= : > On May 15, 2009, at 11:30 AM, Thomas Backman wrote: >> >> On May 15, 2009, at 12:28 AM, James R. Van Artsdalen wrote: >> >>> Thomas Backman wrote: >>>> >>>> [root@chaos ~]# zfs send -R -I $OLD tank@$NOW > diff-snap >>>> [root@chaos ~]# cat diff-snap | zfs recv -Fvd slave >>>> Segmentation fault: 11 (core dumped) >>>> >>>> Same kinda backtrace, but what's up with strcmp()? >>>> I suppose the issue stems from libzfs, and is not within libc: >>> >>> Different problem =A0The SIGSEGV is happening in strcmp because it is >>> called with strcmp(0,0) >>> and tries to dereference address -4 (probably another bug itself). >>> >>> This hack gets around the issue but someone familiar with this needs to >>> decide the correct action. >>> >>> The first change is actually unrelated (a sorry attempt at fixing the >>> previous zfs send bug). >>> >>> The last change may be unnecessary as that case may never happen unless >>> the pool can be renamed? >>> >>> [... patch ...] >> >> Thanks! This list is pretty impressive. :) >> I can't validate how correct the fix is, considering my lacking knowledg= e >> in C (I know the basics, but kernel/related programming? no way!), but I= CAN >> say that it appears to work just fine! >> >> Regards, >> Thomas >> > Any news on this? The bug's been around for a long time, and a fix has be= en > around for at least 1.5 months now, and AFAIK the bug still lives. > The patch, again (I can't vouch for its correctness, but I can certainly = say > that it works just fine *for me*) follows. > > Regards, > Thomas > > Index: cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > =A0(revision 194851) > +++ cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c > =A0(working copy) > @@ -239,6 +239,8 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0char *propname =3D nvpair_name(elem); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0zfs_prop_t prop =3D zfs_name_to_prop(propn= ame); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nvlist_t *propnv; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (prop =3D=3D ZPROP_INVAL) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!zfs_prop_user(propname) && zfs_prop_r= eadonly(prop)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; > @@ -1126,7 +1128,7 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0uint64_t originguid =3D 0; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0uint64_t stream_originguid =3D 0; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0uint64_t parent_fromsnap_guid, stream_pare= nt_fromsnap_guid; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 char *fsname, *stream_fsname; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 char *fsname, *stream_fsname, *p1, *p2; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nextfselem =3D nvlist_next_nvpair(local_nv= , fselem); > > @@ -1295,10 +1297,13 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"parentfromsnap", &stream_parent_f= romsnap_guid)); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* check for rename */ > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 p1 =3D strrchr(fsname, '/'); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 p2 =3D strrchr(stream_fsname, '/'); > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if ((stream_parent_fromsnap_guid !=3D 0 && > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0stream_parent_fromsnap_guid !=3D p= arent_fromsnap_guid) || > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 strcmp(strrchr(fsname, '/'), > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 strrchr(stream_fsname, '/')) !=3D 0= ) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (p1 !=3D NULL && p2 !=3D NULL && st= rcmp (p1, p2) !=3D 0) || > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0((p1 =3D=3D NULL) ^ (p2 =3D=3D N= ULL))) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nvlist_t *parent; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0char tryname[ZFS_MAXNAMELE= N]; > > @@ -1317,7 +1322,7 @@ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0VERIFY(0 = =3D=3D nvlist_lookup_string(parent, > "name", > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&p= name)); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(void) snp= rintf(tryname, sizeof (tryname), > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "%s= %s", pname, strrchr(stream_fsname, > '/')); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "%s%s", pna= me, p2 ? p2 : ""); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tryname[0]= =3D '\0'; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (flags.= verbose) { > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > --=20 When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke From owner-freebsd-fs@FreeBSD.ORG Sun Jun 28 22:13:03 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA539106564A for ; Sun, 28 Jun 2009 22:13:03 +0000 (UTC) (envelope-from unixtools@hotmail.com) Received: from blu0-omc4-s38.blu0.hotmail.com (blu0-omc4-s38.blu0.hotmail.com [65.55.111.177]) by mx1.freebsd.org (Postfix) with ESMTP id B483A8FC25 for ; Sun, 28 Jun 2009 22:13:03 +0000 (UTC) (envelope-from unixtools@hotmail.com) Received: from BLU142-W3 ([65.55.111.135]) by blu0-omc4-s38.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.3959); Sun, 28 Jun 2009 14:59:58 -0700 Message-ID: X-Originating-IP: [68.36.223.118] From: Sunil Sunder Raj To: Date: Sun, 28 Jun 2009 21:59:58 +0000 Importance: Normal MIME-Version: 1.0 X-OriginalArrivalTime: 28 Jun 2009 21:59:58.0593 (UTC) FILETIME=[C738FF10:01C9F83B] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: File System performance X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Jun 2009 22:13:04 -0000 Does the feebsd port collection have any tool to debug or benchmark IO perf= ormance systat -iosttat just gives me the below output /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10 Load Average | /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 cpu user|XX nice| system|X interrupt| idle|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX /0 /10 /20 /30 /40 /50 /60 /70 /80 /90 /100 ad0 MB/s tps| twed0 MB/s tps|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX411.32 How do I get detailed information on which process is using the io. Somethi= ng like perfmon. _________________________________________________________________ Insert movie times and more without leaving Hotmail=AE.=20 http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=3DTXT_TAGLM_WL_HM_Tut= orial_QuickAdd_062009= From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 00:26:35 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22992106564A for ; Mon, 29 Jun 2009 00:26:35 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CD1C88FC12 for ; Mon, 29 Jun 2009 00:26:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AocGAO+fR0qDaFvL/2dsb2JhbACOZQG7aYQNBQ X-IronPort-AV: E=Sophos;i="4.42,305,1243828800"; d="scan'208";a="37686296" Received: from nile.cs.uoguelph.ca ([131.104.91.203]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 28 Jun 2009 19:57:59 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by nile.cs.uoguelph.ca (Postfix) with ESMTP id 9298B8D40DC; Sun, 28 Jun 2009 19:57:59 -0400 (EDT) X-Virus-Scanned: amavisd-new at nile.cs.uoguelph.ca Received: from nile.cs.uoguelph.ca ([127.0.0.1]) by localhost (nile.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jM37aPjy+qfV; Sun, 28 Jun 2009 19:57:58 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by nile.cs.uoguelph.ca (Postfix) with ESMTP id BDF9F8D40A3; Sun, 28 Jun 2009 19:57:58 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n5T00Df05807; Sun, 28 Jun 2009 20:00:14 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Sun, 28 Jun 2009 20:00:13 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-current@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 00:26:35 -0000 I just noticed that when I do the following: - start a large write to an NFS mounted fs - network partition the server (unplug a net cable) - do a "umount -f " on the machine that it gets stuck trying to write dirty blocks to the server. I had, in the past, assumed that a "umount -f" of an NFS mount would be used to get rid of an NFS mount on an unresponsive server and that loss of "writes in progress" would be expected to happen. Does that sound correct? (In other words, an I seeing a bug or a feature?) Thanks in advance for any info, rick ps: I have a simple "fix" if this is a bug, but I wanted to check before submitting a patch. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 00:32:20 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF61E1065676 for ; Mon, 29 Jun 2009 00:32:20 +0000 (UTC) (envelope-from nhoyle@hoyletech.com) Received: from mout.perfora.net (mout.perfora.net [74.208.4.194]) by mx1.freebsd.org (Postfix) with ESMTP id A898B8FC18 for ; Mon, 29 Jun 2009 00:32:20 +0000 (UTC) (envelope-from nhoyle@hoyletech.com) Received: from [192.168.1.10] (pool-96-231-140-65.washdc.fios.verizon.net [96.231.140.65]) by mrelay.perfora.net (node=mrus1) with ESMTP (Nemesis) id 0MKpCa-1ML4n92qqC-000CHU; Sun, 28 Jun 2009 20:32:19 -0400 Message-ID: <4A480B8C.1060708@hoyletech.com> Date: Sun, 28 Jun 2009 20:32:12 -0400 From: Nathanael Hoyle User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: Rick Macklem References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V01U2FsdGVkX19C1ctppmWifb7SoHsEFW3XglbxdKhFVYx+NCE ZowGOP4PcyJnGxQm5vjyVET45YZlyGps3ZIua2c6d8pdWlj1hK W+gzRSV3Gx1sc7Z0e9crhnFS367sNp7 Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 00:32:21 -0000 Rick Macklem wrote: > I just noticed that when I do the following: > - start a large write to an NFS mounted fs > - network partition the server (unplug a net cable) > - do a "umount -f " on the machine > > that it gets stuck trying to write dirty blocks to the server. > > I had, in the past, assumed that a "umount -f" of an NFS mount would be > used to get rid of an NFS mount on an unresponsive server and that loss > of "writes in progress" would be expected to happen. > > Does that sound correct? (In other words, an I seeing a bug or a > feature?) > > Thanks in advance for any info, rick > ps: I have a simple "fix" if this is a bug, but I wanted to check before > submitting a patch. I think the answer is probably "it's a feature, not a bug", but that depends on your NFS mount options which you didn't give. I'd suggest you read up on NFS soft versus hard mounts. I think you're seeing the latter and expecting the former behavior. The first hit I found Googling seems pretty decent, though taken from Linux docs should still apply: http://tldp.org/HOWTO/NFS-HOWTO/client.html Under section 4.3.1 "Soft vs. Hard Mounting" there's a basic description. Best of luck, -Nathanael From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 04:53:05 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BABA4106564A; Mon, 29 Jun 2009 04:53:05 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from dglawrence.com (75-148-92-17-Oregon.hfc.comcastbusiness.net [75.148.92.17]) by mx1.freebsd.org (Postfix) with ESMTP id 936088FC0C; Mon, 29 Jun 2009 04:53:05 +0000 (UTC) (envelope-from dg@dglawrence.com) Received: from tnn.dglawrence.com (localhost [127.0.0.1]) by dglawrence.com (8.14.1/8.14.1) with ESMTP id n5T4r5K7093361; Sun, 28 Jun 2009 21:53:05 -0700 (PDT) (envelope-from dg@dglawrence.com) Received: (from dg@localhost) by tnn.dglawrence.com (8.14.1/8.14.1/Submit) id n5T4r4AK093302; Sun, 28 Jun 2009 21:53:04 -0700 (PDT) (envelope-from dg@dglawrence.com) X-Authentication-Warning: tnn.dglawrence.com: dg set sender to dg@dglawrence.com using -f Date: Sun, 28 Jun 2009 21:53:04 -0700 From: David G Lawrence To: Rick Macklem Message-ID: <20090629045304.GI39302@tnn.dglawrence.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dglawrence.com [127.0.0.1]); Sun, 28 Jun 2009 21:53:05 -0700 (PDT) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 04:53:06 -0000 > I just noticed that when I do the following: > - start a large write to an NFS mounted fs > - network partition the server (unplug a net cable) > - do a "umount -f " on the machine > > that it gets stuck trying to write dirty blocks to the server. > > I had, in the past, assumed that a "umount -f" of an NFS mount would be > used to get rid of an NFS mount on an unresponsive server and that loss > of "writes in progress" would be expected to happen. > > Does that sound correct? (In other words, an I seeing a bug or a > feature?) > > Thanks in advance for any info, rick > ps: I have a simple "fix" if this is a bug, but I wanted to check before > submitting a patch. I would say that you are seeing a bug. -f is supposed to mean "force", of course. Any buffers or outstanding transactions should be terminated immediately. Oh, and most of us know that you, as one of the NFS developers in the past, well-know the difference between hard and soft NFS mounts. ;-) -DG David G. Lawrence President Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500 Pave the road of life with opportunities. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 10:16:37 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 99685106564A for ; Mon, 29 Jun 2009 10:16:37 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-bw0-f216.google.com (mail-bw0-f216.google.com [209.85.218.216]) by mx1.freebsd.org (Postfix) with ESMTP id 1C0548FC17 for ; Mon, 29 Jun 2009 10:16:36 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by bwz12 with SMTP id 12so234126bwz.43 for ; Mon, 29 Jun 2009 03:16:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=mIVQI17NOExRbSRmRK8ELiJp93MjRgCudWH9N7wy2HM=; b=U1BYC7EDyenimIek8btf+9DK3MyF1ds5RUCH4Vr8NJWTbXxKG5+pX5cQCS1t9Vd596 YNbRVK5fluod8lMJQLVvMENGFjYL87LwY6LTKcRhaM7jfLwZsPwy0rkss5TvEKzdKLtc qADjTnZcga9B78RMovkPy9SQxF1aXICpZCwao= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=EuWYtjmT4hlxQVMEHFcjLCQZZN0Vfs+i4Mc9LYpeXy5xnc9ApubaNa0CEB+fKdyqHS f6hVmZRJprkN66cwU9bMQmWS6feblBRBiUMMT8Lf6rcGsaEmDHvU+0cstA7VTEL3OZJL lojFzyUq8PTrEVJcvYCjZPi1E0Vji1Yhalnb0= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.113.9 with SMTP id y9mr4272783fap.19.1246269371623; Mon, 29 Jun 2009 02:56:11 -0700 (PDT) In-Reply-To: References: Date: Mon, 29 Jun 2009 11:56:11 +0200 X-Google-Sender-Auth: f51c70cf27ded9c1 Message-ID: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> From: Attilio Rao To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 10:16:37 -0000 2009/6/29 Rick Macklem : > I just noticed that when I do the following: > - start a large write to an NFS mounted fs > - network partition the server (unplug a net cable) > - do a "umount -f " on the machine > > that it gets stuck trying to write dirty blocks to the server. > > I had, in the past, assumed that a "umount -f" of an NFS mount would be > used to get rid of an NFS mount on an unresponsive server and that loss > of "writes in progress" would be expected to happen. > > Does that sound correct? (In other words, an I seeing a bug or a feature?) While that should be real in principle (immediate shutdown of the fs operation and unmounting of the partition) it is totally impossible to have it completely unsleeping, so it can happen that also umount -f sleeps / delays for some times (example: vflush). Currently, umount -f is one of the most complicated thing to handle in our VFS because it puts as requirement that vnodes can be reclaimed in any moment, adding complexity and possibility for races. What's the fix for your problem? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 11:06:57 2009 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1246106564A for ; Mon, 29 Jun 2009 11:06:57 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BDE138FC12 for ; Mon, 29 Jun 2009 11:06:57 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5TB6vA2046311 for ; Mon, 29 Jun 2009 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5TB6vTK046307 for freebsd-fs@FreeBSD.org; Mon, 29 Jun 2009 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 29 Jun 2009 11:06:57 GMT Message-Id: <200906291106.n5TB6vTK046307@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 11:06:58 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/135594 fs [zfs] Single dataset unresponsive with Samba o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135480 fs [zfs] panic: lock &arg.lock already initialized o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135412 fs [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREA o bin/135314 fs [zfs] assertion failed for zdb(8) usage o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot f kern/134496 fs [zfs] [panic] ZFS pool export occasionally causes a ke o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133980 fs [panic] [ffs] panic: ffs_valloc: dup alloc o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [smbfs] [panic] panic: ffs_truncate: read-only filesys o kern/133373 fs [zfs] umass attachment causes ZFS checksum errors, dat o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/133134 fs [zfs] Missing ZFS zpool labels f kern/133020 fs [zfs] [panic] inappropriate panic caused by zfs. Pani o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132597 fs [tmpfs] [panic] tmpfs-related panic while interrupting o kern/132551 fs [zfs] ZFS locks up on extattr_list_link syscall o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132337 fs [zfs] [panic] kernel panic in zfs_fuid_create_cred o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes f kern/132068 fs [zfs] page fault when using ZFS over NFS on 7.1-RELEAS o kern/131995 fs [nfs] Failure to mount NFSv4 server o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/131086 fs [ext2fs] [patch] mkfs.ext2 creates rotten partition o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129148 fs [zfs] [panic] panic on concurrent writing & rollback o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/128633 fs [zfs] [lor] lock order reversal in zfs o kern/128514 fs [zfs] [mpt] problems with ZFS and LSILogic SAS/SATA Ad f kern/128173 fs [ext2fs] ls gives "Input/output error" on mounted ext3 o kern/127659 fs [tmpfs] tmpfs memory leak o kern/127492 fs [zfs] System hang on ZFS input-output o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127213 fs [tmpfs] sendfile on tmpfs data corruption o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/125644 fs [zfs] [panic] zfs unfixable fs errors caused panic whe f kern/125536 fs [ext2fs] ext 2 mounts cleanly but fails on commands li o kern/125149 fs [nfs] [panic] changing into .zfs dir from nfs client c f kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122888 fs [zfs] zfs hang w/ prefetch on, zil off while running t o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o kern/122173 fs [zfs] [panic] Kernel Panic if attempting to replace a o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o kern/122047 fs [ext2fs] [patch] incorrect handling of UF_IMMUTABLE / o kern/122038 fs [tmpfs] [panic] tmpfs: panic: tmpfs_alloc_vp: type 0xc o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o kern/121770 fs [zfs] ZFS on i386, large file or heavy I/O leads to ke o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o bin/120288 fs zfs(8): "zfs share -a" does not send SIGHUP to mountd f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o misc/118855 fs [zfs] ZFS-related commands are nonfunctional in fixit o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118320 fs [zfs] [patch] NFS SETATTR sometimes fails to set file o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o kern/113180 fs [zfs] Setting ZFS nfsshare property does not cause inh o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/105093 fs [ext2fs] [patch] ext2fs on read-only media cannot be m o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist f kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/89991 fs [ufs] softupdates with mount -ur causes fs UNREFS o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/77826 fs [ext2fs] ext2fs usb filesystem will not mount RW o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 143 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 14:36:28 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD02A1065695; Mon, 29 Jun 2009 14:36:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 6B17F8FC26; Mon, 29 Jun 2009 14:36:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAC9uSEqDaFvK/2dsb2JhbADONYI2AYFWBQ X-IronPort-AV: E=Sophos;i="4.42,309,1243828800"; d="scan'208";a="37743957" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Jun 2009 10:36:27 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 8CC22109C25E; Mon, 29 Jun 2009 10:36:27 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UGfVInBG8Jlz; Mon, 29 Jun 2009 10:36:27 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id EFE00109C257; Mon, 29 Jun 2009 10:36:26 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n5TEchd07187; Mon, 29 Jun 2009 10:38:43 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Mon, 29 Jun 2009 10:38:43 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Attilio Rao In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 14:36:29 -0000 On Mon, 29 Jun 2009, Attilio Rao wrote: > 2009/6/29 Rick Macklem : >> I just noticed that when I do the following: >> - start a large write to an NFS mounted fs >> - network partition the server (unplug a net cable) >> - do a "umount -f " on the machine >> >> that it gets stuck trying to write dirty blocks to the server. >> >> I had, in the past, assumed that a "umount -f" of an NFS mount would be >> used to get rid of an NFS mount on an unresponsive server and that loss >> of "writes in progress" would be expected to happen. >> >> Does that sound correct? (In other words, an I seeing a bug or a feature?) > > While that should be real in principle (immediate shutdown of the fs > operation and unmounting of the partition) it is totally impossible to > have it completely unsleeping, so it can happen that also umount -f > sleeps / delays for some times (example: vflush). > Currently, umount -f is one of the most complicated thing to handle in > our VFS because it puts as requirement that vnodes can be reclaimed in > any moment, adding complexity and possibility for races. > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > What's the fix for your problem? > Well, when I tested it I found that it got stuck in two places, both calls to VFS_SYNC(). The first was a sync(); right at the beginning of umount.c. - All I did for that one is move it to after the code that handles option processing and change it to if ((fflag & MNT_FORCE) == 0) sync(); so that it isn't done for the "-f" case. (I believe the sync(); call at the beginning of umount is only a performance optimization, so I don't think not doing it for "-f" should break anything.) - the second happened just before the VFS_UNMOUNT() call in the umount(2) system call. The code looks like: if (((mp->mnt_flag & MNT_RDONLY) || (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) - Although it was tempting to reverse the order of VFS_SYNC() and the test for MNT_FORCE, I thought that might have a negative impact on other file systems, since it avoided doing the VFS_SYNC(), so... - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of nfs_sync(), so that it returns EBUSY for this case instead of getting stuck trying to flush(). Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, it simply ensures that the umount command thread makes it as far as VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It kills RPCs in progress before doing the vflush() and, since no new RPCs can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of a request), the vflush() won't actually flush anything to the server. As such, "umount -f" is pretty well guaranteed to throw away the dirty buffers. I believe this is correct behaviour, but it would mean that a user/sysadmin that uses "umount -f" for cases where the server is still functioning, but slow, will lose data when they probably don't expect to. Does this help? rick ps: During simple testing, it has worked ok. It waits about 1 minute for the RPC threads to shut down, but the "umount -f" does complete after that happens. It the consensus seems to be that patching this is a good idea, I'll get some more testing done. From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 15:15:09 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2939110656B1; Mon, 29 Jun 2009 15:15:09 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id C048C8FC0C; Mon, 29 Jun 2009 15:15:08 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAI93SEqDaFvL/2dsb2JhbADPDoQNBYE3 X-IronPort-AV: E=Sophos;i="4.42,309,1243828800"; d="scan'208";a="37749025" Received: from nile.cs.uoguelph.ca ([131.104.91.203]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Jun 2009 11:15:06 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by nile.cs.uoguelph.ca (Postfix) with ESMTP id 0BF828D4116; Mon, 29 Jun 2009 11:15:05 -0400 (EDT) X-Virus-Scanned: amavisd-new at nile.cs.uoguelph.ca Received: from nile.cs.uoguelph.ca ([127.0.0.1]) by localhost (nile.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WoLeBH0J7Zfy; Mon, 29 Jun 2009 11:15:03 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by nile.cs.uoguelph.ca (Postfix) with ESMTP id 7F0ED8D4072; Mon, 29 Jun 2009 11:15:03 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n5TFHKe12786; Mon, 29 Jun 2009 11:17:20 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Mon, 29 Jun 2009 11:17:20 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Nathanael Hoyle In-Reply-To: <4A480B8C.1060708@hoyletech.com> Message-ID: References: <4A480B8C.1060708@hoyletech.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 15:15:09 -0000 On Sun, 28 Jun 2009, Nathanael Hoyle wrote: > I think the answer is probably "it's a feature, not a bug", but that depends > on your NFS mount options which you didn't give. I'd suggest you read up on > NFS soft versus hard mounts. I think you're seeing the latter and expecting > the former behavior. > Well, part of the problem is that I'm working on a client that includes NFSv4 and, at least for NFSv4, getting "intr" or "soft" mounts to work correctly is nearly impossible. Since NFSv4 includes lock state operations that must be strictly serialized and the state maintained in a consistent way, you can't just "terminate" an RPC involving these Ops without breaking all state handling. Also, I/O system calls generally aren't expected to fail with EINTR and many (most??) apps. get broken by this happening. Personally, I believe that "hard" mounts plus the use of "umount -f" to get rid of mounts against unresponsive servers is the preferred way to go and the first step in this direction would be getting "umount -f" to work for the above case (plus agreement that the semantics of "umount -f" include "loss of recently written data"). There was a thread on this a few months ago, which I cant find, but there is pr129760 w.r.t. FreeBSD locking up upon a "umount -f". (Btw, I believe that Mac OS X has adopted this concept. It pops up a "disconnect mount" window for unresponsive servers and does essentially a "umount -f" if the user clicks "ok".) > The first hit I found Googling seems pretty decent, though taken from Linux > docs should still apply: > > http://tldp.org/HOWTO/NFS-HOWTO/client.html > > Under section 4.3.1 "Soft vs. Hard Mounting" there's a basic description. > There was a time when SunOS/Solaris was considered the "gold standard" for NFS (but I suppose this is the Linux era;-). My recollection might be fuzzy, but I don't think SunOS had a "umount -f" in those days and I think "intr" was introduced after their first release, as an improvement over "soft", since NFS servers got really slow when running on 1985 hardware. Solaris10 does have a "umount -f" and the man page notes that data related to open files can be lost when it is used. (This would basically be the semantic "umount -f" on FreeBSD will have if the "sync"s aren't done.) rick From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 19:06:09 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CDB710656D8; Mon, 29 Jun 2009 19:06:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx08.syd.optusnet.com.au (fallbackmx08.syd.optusnet.com.au [211.29.132.10]) by mx1.freebsd.org (Postfix) with ESMTP id 9A49C8FC38; Mon, 29 Jun 2009 19:06:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by fallbackmx08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n5THrCej027527; Tue, 30 Jun 2009 03:53:12 +1000 Received: from c122-107-127-32.carlnfd1.nsw.optusnet.com.au (c122-107-127-32.carlnfd1.nsw.optusnet.com.au [122.107.127.32]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id n5THr8k7025295 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 30 Jun 2009 03:53:09 +1000 Date: Tue, 30 Jun 2009 03:53:09 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Rick Macklem In-Reply-To: Message-ID: <20090630010035.E37426@delplex.bde.org> References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Attilio Rao , freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 19:06:09 -0000 On Mon, 29 Jun 2009, Rick Macklem wrote: > On Mon, 29 Jun 2009, Attilio Rao wrote: >> >> While that should be real in principle (immediate shutdown of the fs >> operation and unmounting of the partition) it is totally impossible to >> have it completely unsleeping, so it can happen that also umount -f >> sleeps / delays for some times (example: vflush). >> Currently, umount -f is one of the most complicated thing to handle in >> our VFS because it puts as requirement that vnodes can be reclaimed in >> any moment, adding complexity and possibility for races. >> > Yes, agreed. And I like to leave that stuff to more clever chaps than I:-) > >> What's the fix for your problem? >> > Well, when I tested it I found that it got stuck in two places, both > calls to VFS_SYNC(). The first was a > sync(); > right at the beginning of umount.c. > - All I did for that one is move it to after the code that handles > option processing and change it to > if ((fflag & MNT_FORCE) == 0) > sync(); > so that it isn't done for the "-f" case. (I believe the sync(); call > at the beginning of umount is only a performance optimization, so I > don't think not doing it for "-f" should break anything.) OK. This sync() is probably actually a performance pessimization, since it syncs all file systems while the internal sync in umount(2) only syncs the one being unmounted. > - the second happened just before the VFS_UNMOUNT() call in the > umount(2) system call. The code looks like: > if (((mp->mnt_flag & MNT_RDONLY) || > (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != > 0) > - Although it was tempting to reverse the order of VFS_SYNC() and the > test for MNT_FORCE, I thought that might have a negative impact on > other file systems, since it avoided doing the VFS_SYNC(), so... > > - Instead, I just put a check for MNTK_UNMOUNTF at the beginning of > nfs_sync(), so that it returns EBUSY for this case instead of getting > stuck trying to flush(). OK. This sync is probably an optimization for correctness, since it arranges to do as much as possible without forcing. I checked ffs_mount() and found 2 large bugs, one related: - in the only case that tends to cause problems, namely the non-readonly case, ffs_unmount() does a suspend which calls VOP_SYNC(..., MNT_SUSPEND), but after errors from this sync it checks neither MNT_FORCE nor error == ENXIO. I think the usual effect is the same as if the top-level unmount() didn't check MNT_FORCE after suspend failure: in problematic cases, we have an unrecoverable write, due to the device going away or just an i/o error, and this error has probably already occured (only in rare cases will it be triggered by unmount). Then MNT_FORCE is essentially unused, and the ENXIO hack is not reached either, and unmount usually fails. - the UFS_EXTATTR case destroys infrastructure before committing to succeeding. It used to be just broken on failure. Now it uses a hack to recover (call a constructor) on failure, but the recovery code is not reached in the usual case of failure -- when the suspension fails. ffs_unmount() still seems to have no support for handling unrecoverable write errors (short of you converting them to ENXIO by removing the media). MNT_FORCE only meant FORCECLOSE for it. I see that old nfs was similar, and you are now making MNT_FORCE stronger. I thought that umount(8)'s man page documented -f being strongly forceful, but checking it shows that it only documents a weak force like that of FORCECLOSE (but not precisely enough). Perhaps a different flag should be used for strong forcefulness. Weak forcefulness is still useful and used for mount -f -u -- for remount we would never want errors in the file system itself ignored. This use also shows that the generic FORCECLOSE code must not ignore errors. > Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c, > it simply ensures that the umount command thread makes it as far as > VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It > kills RPCs in progress before doing the vflush() and, since no new RPCs > can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of > a request), the vflush() won't actually flush anything to the server. > > As such, "umount -f" is pretty well guaranteed to throw away the dirty > buffers. I believe this is correct behaviour, This is how I think ffs_mount() should work too -- It should be responsible for throwing away the dirty buffers, while nothing else should discard them. Now the discarding seems to be done by falling through to g_vfs_done(), except g_vfs_done() is not reached in most cases (see above). I don't like this -- at best we lose the opportunity to print ffs-specific details about what was discarded. Falling through only works for ENXIO anyway -- on other errors we should discard the unwritable buffers in an fs-specific manner so as to write as many of the writable buffers as possible. > but it would mean that a > user/sysadmin that uses "umount -f" for cases where the server is still > functioning, but slow, will lose data when they probably don't expect to. A new flag would help for this. Bruce From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 19:55:04 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0DEA110656A4; Mon, 29 Jun 2009 19:55:04 +0000 (UTC) (envelope-from gthiel@smapper.com) Received: from qhexrelay2.hosting.inetserver.de (fw1.hostedoffice.ag [81.20.90.82]) by mx1.freebsd.org (Postfix) with ESMTP id 82C258FC1B; Mon, 29 Jun 2009 19:55:03 +0000 (UTC) (envelope-from gthiel@smapper.com) Received: from qhexhub1.hosting.inetserver.de (unknown [10.20.10.20]) by qhexrelay2.hosting.inetserver.de (Postfix) with ESMTP id B1ED8187086; Mon, 29 Jun 2009 21:29:20 +0200 (CEST) Received: from QHEXMBOX1.hosting.inetserver.de ([10.20.10.31]) by qhexhub1.hosting.inetserver.de ([10.20.10.225]) with mapi; Mon, 29 Jun 2009 21:37:39 +0200 Content-Type: multipart/mixed; boundary="_000_0FEF727F15922F4D8349632C35EE6C276FB1A4470AQHEXMBOX1host_" From: Gunther Thiel To: "rmacklem@uoguelph.ca" , "freebsd-current@freebsd.org" Date: Mon, 29 Jun 2009 21:37:37 +0200 Thread-Topic: umount -f implementation Thread-Index: Acn4UEBr2pLuAmXgTF6/pO1Lp8OXUAAoM6Qy Message-ID: <0FEF727F15922F4D8349632C35EE6C276FB1A4470A@QHEXMBOX1.hosting.inetserver.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: <0FEF727F15922F4D8349632C35EE6C276FB1A4470A@QHEXMBOX1.hosting.inetserver.de> acceptlanguage: en-US MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: AW: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 19:55:04 -0000 --_000_0FEF727F15922F4D8349632C35EE6C276FB1A4470AQHEXMBOX1host_ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SW4gcHJhY3RpY2UsIHRoZXJlIGFyZSBzaXR1YXRpb25zIHdoZXJlIG9uZSBkb2VzIHdhbnQgdG8g Z2V0IHJpZCBvZiBhIG5vbiByZWFjaGFibGUgbW91bnBvaW50IChzcGVjaWZpY2FsbHkgZm9yIE5G Uykgd2hpY2ggYmFzaWNhbGx5IGlzIG5vdCBwb3NzaWJsZSBhcyBvZiB0b2RheS4NCg0KQSBmaXgg aW4gY2FzZSB0aGUgLWYgKG9yIGFub3RoZXIgbmV3IGZsYWcgbGlrZSkgd2VyZSBzdXBwbGllZCwg d291bGQgYmUgaGlnaGx5IGFwcHJlY2lhdGVkLg0KDQpUaGFua3MsDQpHdW50aGVyDQoNCg0KLS0N ClNtQXBwZXIgVGVjaG5vbG9naWVzIEdtYkgg4oCTICs0MyA1MzcyIDY5MTIgNjQwIOKAkyB3d3cu c21hcHBlci5jb20NCg0KLS0tLS0gT3JpZ2luYWxuYWNocmljaHQgLS0tLS0NClZvbjogb3duZXIt ZnJlZWJzZC1mc0BmcmVlYnNkLm9yZyA8b3duZXItZnJlZWJzZC1mc0BmcmVlYnNkLm9yZz4NCkFu OiBmcmVlYnNkLWN1cnJlbnRAZnJlZWJzZC5vcmcgPGZyZWVic2QtY3VycmVudEBmcmVlYnNkLm9y Zz4NCkNjOiBmcmVlYnNkLWZzQGZyZWVic2Qub3JnIDxmcmVlYnNkLWZzQGZyZWVic2Qub3JnPg0K R2VzZW5kZXQ6IE1vbiBKdW4gMjkgMDI6MDA6MTMgMjAwOQpCZXRyZWZmOiB1bW91bnQgLWYgaW1w bGVtZW50YXRpb24NCg0KSSBqdXN0IG5vdGljZWQgdGhhdCB3aGVuIEkgZG8gdGhlIGZvbGxvd2lu ZzoNCi0gc3RhcnQgYSBsYXJnZSB3cml0ZSB0byBhbiBORlMgbW91bnRlZCBmcw0KLSBuZXR3b3Jr IHBhcnRpdGlvbiB0aGUgc2VydmVyICh1bnBsdWcgYSBuZXQgY2FibGUpDQotIGRvIGEgInVtb3Vu dCAtZiA8bW50cG9pbnQ+IiBvbiB0aGUgbWFjaGluZQ0KDQp0aGF0IGl0IGdldHMgc3R1Y2sgdHJ5 aW5nIHRvIHdyaXRlIGRpcnR5IGJsb2NrcyB0byB0aGUgc2VydmVyLg0KDQpJIGhhZCwgaW4gdGhl IHBhc3QsIGFzc3VtZWQgdGhhdCBhICJ1bW91bnQgLWYiIG9mIGFuIE5GUyBtb3VudCB3b3VsZCBi ZQ0KdXNlZCB0byBnZXQgcmlkIG9mIGFuIE5GUyBtb3VudCBvbiBhbiB1bnJlc3BvbnNpdmUgc2Vy dmVyIGFuZCB0aGF0IGxvc3MNCm9mICJ3cml0ZXMgaW4gcHJvZ3Jlc3MiIHdvdWxkIGJlIGV4cGVj dGVkIHRvIGhhcHBlbi4NCg0KRG9lcyB0aGF0IHNvdW5kIGNvcnJlY3Q/IChJbiBvdGhlciB3b3Jk cywgYW4gSSBzZWVpbmcgYSBidWcgb3IgYSANCmZlYXR1cmU/KQ0KDQpUaGFua3MgaW4gYWR2YW5j ZSBmb3IgYW55IGluZm8sIHJpY2sNCnBzOiBJIGhhdmUgYSBzaW1wbGUgImZpeCIgaWYgdGhpcyBp cyBhIGJ1ZywgYnV0IEkgd2FudGVkIHRvIGNoZWNrIGJlZm9yZQ0KICAgICBzdWJtaXR0aW5nIGEg cGF0Y2guDQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXw0K ZnJlZWJzZC1mc0BmcmVlYnNkLm9yZyBtYWlsaW5nIGxpc3QNCmh0dHA6Ly9saXN0cy5mcmVlYnNk Lm9yZy9tYWlsbWFuL2xpc3RpbmZvL2ZyZWVic2QtZnMNClRvIHVuc3Vic2NyaWJlLCBzZW5kIGFu eSBtYWlsIHRvICJmcmVlYnNkLWZzLXVuc3Vic2NyaWJlQGZyZWVic2Qub3JnIg0K --_000_0FEF727F15922F4D8349632C35EE6C276FB1A4470AQHEXMBOX1host_-- From owner-freebsd-fs@FreeBSD.ORG Mon Jun 29 23:53:52 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EF01106564A for ; Mon, 29 Jun 2009 23:53:52 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: from hamlet.setfilepointer.com (hamlet.SetFilePointer.com [63.224.10.2]) by mx1.freebsd.org (Postfix) with SMTP id CE79D8FC0A for ; Mon, 29 Jun 2009 23:53:51 +0000 (UTC) (envelope-from rick@kiwi-computer.com) Received: (qmail 54739 invoked from network); 29 Jun 2009 18:27:10 -0500 Received: from keira.kiwi-computer.com (HELO kiwi-computer.com) (63.224.10.3) by hamlet.setfilepointer.com with SMTP; 29 Jun 2009 18:27:10 -0500 Received: (qmail 25161 invoked by uid 2001); 29 Jun 2009 23:27:10 -0000 Date: Mon, 29 Jun 2009 18:27:10 -0500 From: "Rick C. Petty" To: Nathanael Hoyle Message-ID: <20090629232710.GA24986@keira.kiwi-computer.com> References: <4A480B8C.1060708@hoyletech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4A480B8C.1060708@hoyletech.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: rick-freebsd2008@kiwi-computer.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2009 23:53:52 -0000 On Sun, Jun 28, 2009 at 08:32:12PM -0400, Nathanael Hoyle wrote: > Rick Macklem wrote: > > > >Does that sound correct? (In other words, an I seeing a bug or a > >feature?) > > > I think the answer is probably "it's a feature, not a bug", but that > depends on your NFS mount options which you didn't give. I'd suggest > you read up on NFS soft versus hard mounts. I'm pretty sure the person working on NFSv4 for fbsd knows this difference. > I think you're seeing the > latter and expecting the former behavior. Not necessarily true. I've experienced similar behavior and I only use soft mounts (actually: "rw,soft,intr,bg,rdirplus"). In fact this bit me last week when I wanted to move the NFS export on a server. I did the move/rename, updated /etc/exports, and did a "killall -HUP mountd" on the server and I attempted variations of "mount -u" and "umount -f" on the clients. Subsequently, I had to restart most of the client machines, since: - "mount -u" returned ESTALE - "umount" returned EBUSY - "umount -f" failed, I believe with ENXIO In any case, "umount -f" absolutely has to work. What other option does an admin have? Yes, expect potential data loss and expect the umount may not return immediately (plain "umount" can take awhile too). Instead, I saw a bunch of these messages, when another process continued to write to a geli-mounted md'd file on that stale filesystem: kernel: GEOM_ELI: g_eli_read_done() failed md0.eli[READ(offset=1790541824, length=65536)] -- Rick C. Petty From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 14:30:04 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A9ED106567F for ; Tue, 30 Jun 2009 14:30:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 17A9E8FC1B for ; Tue, 30 Jun 2009 14:30:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5UEU3nk052204 for ; Tue, 30 Jun 2009 14:30:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5UEU31v052203; Tue, 30 Jun 2009 14:30:03 GMT (envelope-from gnats) Date: Tue, 30 Jun 2009 14:30:03 GMT Message-Id: <200906301430.n5UEU31v052203@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Andriy Gapon Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Andriy Gapon List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 14:30:04 -0000 The following reply was made to PR kern/135412; it has been noted by GNATS. From: Andriy Gapon To: bug-followup@FreeBSD.org, danny@cs.huji.ac.il Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Date: Tue, 30 Jun 2009 17:22:47 +0300 Danny, maybe you misunderstood Gavin's question or maybe I misunderstood your reply. I have stable/7 amd64 from Jun 26, I have upgraded zpool to version 13 (as reported by 'zpool upgrade' command) and I have upgraded zfs on-disk to version 3 (as reported by 'zfs upgrade'). And I can _not_ reproduce your problem using the program you provided - it successfully creates a file on the first run and it fails with '17 - File exists' on the subsequent ones. So, I'd like to re-iterate the question - what on-disk versions of zpool and zfs you have? Please provide output of 'zpool upgrade' and 'zfs upgrade' commands to avoid further uncertainties. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 15:10:04 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6A0D10656C6 for ; Tue, 30 Jun 2009 15:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A3FF68FC19 for ; Tue, 30 Jun 2009 15:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5UFA4OS084604 for ; Tue, 30 Jun 2009 15:10:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5UFA4so084603; Tue, 30 Jun 2009 15:10:04 GMT (envelope-from gnats) Date: Tue, 30 Jun 2009 15:10:04 GMT Message-Id: <200906301510.n5UFA4so084603@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Danny Braniss Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Danny Braniss List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 15:10:05 -0000 The following reply was made to PR kern/135412; it has been noted by GNATS. From: Danny Braniss To: Andriy Gapon Cc: bug-followup@FreeBSD.org, danny@cs.huji.ac.il, danny@cs.huji.ac.il Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Date: Tue, 30 Jun 2009 18:05:28 +0300 > > Danny, > > maybe you misunderstood Gavin's question or maybe I misunderstood your reply. > > I have stable/7 amd64 from Jun 26, I have upgraded zpool to version 13 (as > reported by 'zpool upgrade' command) and I have upgraded zfs on-disk to version 3 > (as reported by 'zfs upgrade'). > And I can _not_ reproduce your problem using the program you provided - it > successfully creates a file on the first run and it fails with '17 - File exists' > on the subsequent ones. > > So, I'd like to re-iterate the question - what on-disk versions of zpool and zfs > you have? > Please provide output of 'zpool upgrade' and 'zfs upgrade' commands to avoid > further uncertainties. > > -- > Andriy Gapon you have to run the program on a client that mounted the zfs volume via nfs. it fails no matter what pool version, either 6 or 13 btw, it works fine if the server is solaris/v13 but just to answer your questions, dev is the server host dev> zfs upgrade This system is currently running ZFS filesystem version 3. All filesystems are formatted with the current version. dev> zpool upgrade This system is currently running ZFS pool version 13. All pools are formatted using this version. dev> danny From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 15:59:05 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 867BE106567D; Tue, 30 Jun 2009 15:59:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 100E78FC25; Tue, 30 Jun 2009 15:59:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAJrTSUqDaFvL/2dsb2JhbADQFoQPBYE3 X-IronPort-AV: E=Sophos;i="4.42,317,1243828800"; d="scan'208";a="39813688" Received: from nile.cs.uoguelph.ca ([131.104.91.203]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 30 Jun 2009 11:59:04 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by nile.cs.uoguelph.ca (Postfix) with ESMTP id 1E6638D4116; Tue, 30 Jun 2009 11:59:04 -0400 (EDT) X-Virus-Scanned: amavisd-new at nile.cs.uoguelph.ca Received: from nile.cs.uoguelph.ca ([127.0.0.1]) by localhost (nile.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qMeyVdoHsslR; Tue, 30 Jun 2009 11:59:03 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by nile.cs.uoguelph.ca (Postfix) with ESMTP id 01AA38D40FF; Tue, 30 Jun 2009 11:59:03 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n5UG1LW11918; Tue, 30 Jun 2009 12:01:21 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 30 Jun 2009 12:01:21 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Attilio Rao In-Reply-To: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Message-ID: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 15:59:06 -0000 On Mon, 29 Jun 2009, Attilio Rao wrote: > > While that should be real in principle (immediate shutdown of the fs > operation and unmounting of the partition) it is totally impossible to > have it completely unsleeping, so it can happen that also umount -f > sleeps / delays for some times (example: vflush). > Currently, umount -f is one of the most complicated thing to handle in > our VFS because it puts as requirement that vnodes can be reclaimed in > any moment, adding complexity and possibility for races. > > What's the fix for your problem? > >From other responses, it does look like pursuing this is appropriate and that current behaviour is considered a bug. I should have noted in the previous email that I suspected that my simple patch didn't handle all cases, which I have just determined via testing. Unfortunately, the thread doing "umount" can also get stuck in an msleep() while waiting for the mnt_lockref to go to 0, which happens before the VFS_UNMOUNT() call. (mnt_lockref gets incremented by various system calls that call vfs_busy().) I think I can fix this in the experimental nfsv4 client, since it has a kernel thread that can check for MNTK_UNMOUNTF being set and then kill off the RPCs in progress, but that won't help the regular client. It's starting to look like too much work for FreeBSD8, but sounds like it is worth pursuing. (Appologies to anyone that thought I would have it all fixed in a day or two.) rick From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 16:08:02 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F020106564A; Tue, 30 Jun 2009 16:08:02 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-bw0-f216.google.com (mail-bw0-f216.google.com [209.85.218.216]) by mx1.freebsd.org (Postfix) with ESMTP id 8F4DC8FC17; Tue, 30 Jun 2009 16:08:01 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by bwz12 with SMTP id 12so227648bwz.43 for ; Tue, 30 Jun 2009 09:08:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=TDD6WF2gXafy9CTKQtZFhm7aP2oicr1uifWy2DcgTEA=; b=bYuV7FCOdCDy8tsTrXE29kHVsr+qsAI1X9Xbwu1olYI6i5vFnSXcxAkuTkVDMCGR60 78dCYBCkbtUOgeYSDFZe6XR9Ymky4CYeHgHyX9iRzEryN/XO/HHGRGFcjM50y0xhapRW Rn5FF917qFQKnbfMXHbNtu8wW3scoYvsJt8k4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=j66oOopeg0sCwNkHTfypcesmDftb8R7mZFxJ/AtiKpGiPRaLI5qF9MH87wl9D1rCpX oswUiXlPPP5D0WroM+QbA1jg22ryF7xPjIPhKxGV3s1f0exv+rTC+5JBPyDmkhQI4Cs+ HsIXUDKlPJTH98x7bQsD7OJH4VHEwS2v4W1KQ= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.223.113.136 with SMTP id a8mr5381699faq.76.1246378080211; Tue, 30 Jun 2009 09:08:00 -0700 (PDT) In-Reply-To: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Date: Tue, 30 Jun 2009 18:08:00 +0200 X-Google-Sender-Auth: 1a771cec11f35e25 Message-ID: <3bbf2fe10906300908p6b0f314di25bab46b03b5933a@mail.gmail.com> From: Attilio Rao To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 16:08:02 -0000 2009/6/30 Rick Macklem : > > > On Mon, 29 Jun 2009, Attilio Rao wrote: > >> >> While that should be real in principle (immediate shutdown of the fs >> operation and unmounting of the partition) it is totally impossible to >> have it completely unsleeping, so it can happen that also umount -f >> sleeps / delays for some times (example: vflush). >> Currently, umount -f is one of the most complicated thing to handle in >> our VFS because it puts as requirement that vnodes can be reclaimed in >> any moment, adding complexity and possibility for races. >> >> What's the fix for your problem? >> > From other responses, it does look like pursuing this is appropriate > and that current behaviour is considered a bug. > > I should have noted in the previous email that I suspected that my simple > patch didn't handle all cases, which I have just determined via testing. > > Unfortunately, the thread doing "umount" can also get stuck in an msleep() > while waiting for the mnt_lockref to go to 0, which happens before the > VFS_UNMOUNT() call. (mnt_lockref gets incremented by various system > calls that call vfs_busy().) Sorry for not answering and I still didn't read this thread at all, I just wanted to let you know that this msleep is skipped for the force unmount, it should just happen in a normal unmount case. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 19:00:14 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEF831065673 for ; Tue, 30 Jun 2009 19:00:14 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A2D9B8FC29 for ; Tue, 30 Jun 2009 19:00:14 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n5UJ0ED9063111 for ; Tue, 30 Jun 2009 19:00:14 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n5UJ0E3g063110; Tue, 30 Jun 2009 19:00:14 GMT (envelope-from gnats) Date: Tue, 30 Jun 2009 19:00:14 GMT Message-Id: <200906301900.n5UJ0E3g063110@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Mike Andrews Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mike Andrews List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 19:00:19 -0000 The following reply was made to PR kern/135412; it has been noted by GNATS. From: Mike Andrews To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error Date: Tue, 30 Jun 2009 14:40:43 -0400 I can also confirm that this happens with an un-upgraded v6 pool using the v13 code. From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 20:01:18 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0586C1065674; Tue, 30 Jun 2009 20:01:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7FED88FC1D; Tue, 30 Jun 2009 20:01:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEANsLSkqDaFvH/2dsb2JhbADRaoJBgU4FgTg X-IronPort-AV: E=Sophos;i="4.42,318,1243828800"; d="scan'208";a="37919896" Received: from danube.cs.uoguelph.ca ([131.104.91.199]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 30 Jun 2009 16:01:16 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id 1A2C5108463B; Tue, 30 Jun 2009 16:01:16 -0400 (EDT) X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca Received: from danube.cs.uoguelph.ca ([127.0.0.1]) by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0qQ09x6AgPl5; Tue, 30 Jun 2009 16:01:15 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id 09DB5108462A; Tue, 30 Jun 2009 16:01:15 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n5UK3Y916395; Tue, 30 Jun 2009 16:03:34 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 30 Jun 2009 16:03:34 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Kostik Belousov In-Reply-To: <20090630193248.GY2884@deviant.kiev.zoral.com.ua> Message-ID: References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> <20090630193248.GY2884@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Attilio Rao , freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 20:01:18 -0000 On Tue, 30 Jun 2009, Kostik Belousov wrote: >> >> I think I can fix this in the experimental nfsv4 client, since it has >> a kernel thread that can check for MNTK_UNMOUNTF being set and then >> kill off the RPCs in progress, but that won't help the regular client. > This solution sounds good, but see below. > > It may be argued by some people, me included, that umount -f shall not > override any ownership of kernel resources. In particular, you must > not ignore the lockref. Instead, the threads that own misc filesystem > resources, like mount reference counter, locked vnodes etc shall be > weed out of the syscalls. E.g., finishing stalled rpc calls with some > error code that is propagated to return code from vops is good solution. > I think that the thread "fix" above would work this way. Right now, nfs_umount() terminates RPCs in progress for the "-f" case and they return RPC_CANTRECV, which just becomes EACCES at the moment. The problem is that, often, the "umount -f" thread never gets as far as nfs_umount(). All I was thinking of doing, above, is having the kernel thread check for MNTK_UNMOUNTF and then do the same thing. (ie. The NFS VOPs would end up returning EACCES, or whatever Exxx might be preferred.) > Another problem with forced unmounts is that VFS does not block new > threads from arriving into VOPs. When finishing the inflight rpcs, > you may either leave some new rpcs behind or loop infinitely chasing > rpcs that arrive while you finishing old rpcs. > The NFS clients already handle this by returning ESTALE at the beginning of nfs_request() without attempting the RPC, if MNTK_UNMOUNTF is set. (Why ESTALE?? Who knows, although I suspect that just about any Exxx will get the job done?) > Umount -f is needed in two different situations, one is normally worked > filesystem that shall be unmounted by administrative request, detaching > any resources opened by application. Second is the last-resort action > when backing storage (server in NFS case, disk for UFS) is misbehaving. > I think we must not break first case for the second. > I think this is what Bruce Evans was referring to. He suggested that there be two flags, like -f and -F, if I understood his post. rick From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 20:18:12 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F0BD106566C for ; Tue, 30 Jun 2009 20:18:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (skuns.zoral.com.ua [91.193.166.194]) by mx1.freebsd.org (Postfix) with ESMTP id F3E488FC16 for ; Tue, 30 Jun 2009 20:18:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id n5UJWmRF082104 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 30 Jun 2009 22:32:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3) with ESMTP id n5UJWmZx054697; Tue, 30 Jun 2009 22:32:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.3/8.14.3/Submit) id n5UJWmsn054696; Tue, 30 Jun 2009 22:32:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 30 Jun 2009 22:32:48 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20090630193248.GY2884@deviant.kiev.zoral.com.ua> References: <3bbf2fe10906290256x4bfbe263jccef017a557f9410@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3/zQ9zHZ+bvXaNu1" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.1 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean Cc: Attilio Rao , freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 20:18:12 -0000 --3/zQ9zHZ+bvXaNu1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jun 30, 2009 at 12:01:21PM -0400, Rick Macklem wrote: >=20 >=20 > On Mon, 29 Jun 2009, Attilio Rao wrote: >=20 > > > >While that should be real in principle (immediate shutdown of the fs > >operation and unmounting of the partition) it is totally impossible to > >have it completely unsleeping, so it can happen that also umount -f > >sleeps / delays for some times (example: vflush). > >Currently, umount -f is one of the most complicated thing to handle in > >our VFS because it puts as requirement that vnodes can be reclaimed in > >any moment, adding complexity and possibility for races. > > > >What's the fix for your problem? > > > >From other responses, it does look like pursuing this is appropriate > and that current behaviour is considered a bug. >=20 > I should have noted in the previous email that I suspected that my simple= =20 > patch didn't handle all cases, which I have just determined via testing. >=20 > Unfortunately, the thread doing "umount" can also get stuck in an msleep(= )=20 > while waiting for the mnt_lockref to go to 0, which happens before the > VFS_UNMOUNT() call. (mnt_lockref gets incremented by various system > calls that call vfs_busy().) >=20 > I think I can fix this in the experimental nfsv4 client, since it has > a kernel thread that can check for MNTK_UNMOUNTF being set and then > kill off the RPCs in progress, but that won't help the regular client. This solution sounds good, but see below. >=20 > It's starting to look like too much work for FreeBSD8, but sounds like > it is worth pursuing. (Appologies to anyone that thought I would have it > all fixed in a day or two.) It may be argued by some people, me included, that umount -f shall not override any ownership of kernel resources. In particular, you must not ignore the lockref. Instead, the threads that own misc filesystem resources, like mount reference counter, locked vnodes etc shall be weed out of the syscalls. E.g., finishing stalled rpc calls with some error code that is propagated to return code from vops is good solution. Quite similar problems happen with SIGSTOP and intr NFS mounts. You saw the proposed solution that is quite similar, it forces the threads owning the resources to progress to syscall boundary. Another problem with forced unmounts is that VFS does not block new threads from arriving into VOPs. When finishing the inflight rpcs, you may either leave some new rpcs behind or loop infinitely chasing rpcs that arrive while you finishing old rpcs. Half-measure is the filesystem suspension, that keeps operations that modify filesystem from entering VOPs. UFS uses suspension for unmounts and rw->ro remounts. Umount -f is needed in two different situations, one is normally worked filesystem that shall be unmounted by administrative request, detaching any resources opened by application. Second is the last-resort action when backing storage (server in NFS case, disk for UFS) is misbehaving. I think we must not break first case for the second. --3/zQ9zHZ+bvXaNu1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkpKaF8ACgkQC3+MBN1Mb4hWSACgtWq2bc/EH/RMoIiDxIX+9X0m BMUAnAxMbWuju006357agvAJMimc252u =b5Pm -----END PGP SIGNATURE----- --3/zQ9zHZ+bvXaNu1-- From owner-freebsd-fs@FreeBSD.ORG Tue Jun 30 22:37:42 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81082106566C; Tue, 30 Jun 2009 22:37:42 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 5C3FE8FC1F; Tue, 30 Jun 2009 22:37:42 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.2/8.14.2) with ESMTP id n5ULwdxk002480; Tue, 30 Jun 2009 14:58:39 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <200906302158.n5ULwdxk002480@chez.mckusick.com> To: Attilio Rao In-reply-to: <3bbf2fe10906300908p6b0f314di25bab46b03b5933a@mail.gmail.com> Date: Tue, 30 Jun 2009 14:58:39 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2009 22:37:43 -0000 Just for the history books, there originally were two forms of forced unmounts. The gentle force (-f) and the brute force (-F) unmount. The -f unmount flushes out all the dirty buffers so that when the unmount completes no data is lost and the filesystem is in a consistent state. The -F unmount invalidates and discards all the dirty buffers without attempting to do any I/O on them. The result is lost data and a possibly inconsistent filesystem. But it will get the job done even if the disk has died or the server has gone away. For reasons that I never tracked down, the -F unmount option was never incorporated into FreeBSD when they did the merge from 4.4BSD-Lite II, so that functionality never made it into the system. It is actually much easier to do than unmount -f since you just walk through and set B_INVAL and B_ERROR on all the dirty buffers for that filesystem. The problem with unmount -f is that it will hang if the server is gone since it will insist on pushing back all the dirty buffers. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Wed Jul 1 01:09:59 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7B15F106564A; Wed, 1 Jul 2009 01:09:59 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from flat.berklix.org (flat.berklix.org [83.236.223.115]) by mx1.freebsd.org (Postfix) with ESMTP id F258A8FC0C; Wed, 1 Jul 2009 01:09:58 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from park.js.berklix.net (p549A613B.dip.t-dialin.net [84.154.97.59]) (authenticated bits=0) by flat.berklix.org (8.13.8/8.13.8) with ESMTP id n610nGf4019454; Wed, 1 Jul 2009 02:49:17 +0200 (CEST) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by park.js.berklix.net (8.13.8/8.13.8) with ESMTP id n610n3gr026985; Wed, 1 Jul 2009 02:49:06 +0200 (CEST) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.3/8.14.3) with ESMTP id n610mPem058027; Wed, 1 Jul 2009 02:48:30 +0200 (CEST) (envelope-from jhs@fire.js.berklix.net) Message-Id: <200907010048.n610mPem058027@fire.js.berklix.net> To: Kirk McKusick From: "Julian H. Stacey" Organization: http://www.berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://www.berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Tue, 30 Jun 2009 14:58:39 PDT." <200906302158.n5ULwdxk002480@chez.mckusick.com> Date: Wed, 01 Jul 2009 02:48:25 +0200 Sender: jhs@berklix.com Cc: Attilio Rao , freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2009 01:09:59 -0000 Kirk McKusick wrote: > forced unmounts. The gentle force (-f) and the brute force (-F) > unmount. -F Would also be nice for devd.conf detach, for when people forget & pull a USB stick without unmounting first. Better a corrupt stick than a crashed OS. Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys. Eng. Consultant Munich http://berklix.com Mail in plain ASCII text, HTML & Base64 are spam. http://asciiribbon.org From owner-freebsd-fs@FreeBSD.ORG Wed Jul 1 12:50:05 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA2E4106564A for ; Wed, 1 Jul 2009 12:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BE97F8FC08 for ; Wed, 1 Jul 2009 12:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n61Co5fD034886 for ; Wed, 1 Jul 2009 12:50:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n61Co51P034885; Wed, 1 Jul 2009 12:50:05 GMT (envelope-from gnats) Date: Wed, 1 Jul 2009 12:50:05 GMT Message-Id: <200907011250.n61Co51P034885@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/135412: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2009 12:50:06 -0000 The following reply was made to PR kern/135412; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/135412: commit references a PR Date: Wed, 1 Jul 2009 12:44:37 +0000 (UTC) Author: avg Date: Wed Jul 1 12:44:23 2009 New Revision: 195236 URL: http://svn.freebsd.org/changeset/base/195236 Log: MFC 185586 (kan): Change nfsserver slightly so that it does not trip over the timestamp validation code on ZFS. This should fix O_CREAT|O_EXCL open on NFS where a server is 64-bit with v13 ZFS code. PR: kern/135412 Pointed out by: Jaakko Heinonen Tested by: Jaakko Heinonen, Danny Braniss Modified: stable/7/sys/ (props changed) stable/7/sys/contrib/pf/ (props changed) stable/7/sys/nfsserver/nfs_serv.c Modified: stable/7/sys/nfsserver/nfs_serv.c ============================================================================== --- stable/7/sys/nfsserver/nfs_serv.c Wed Jul 1 12:36:10 2009 (r195235) +++ stable/7/sys/nfsserver/nfs_serv.c Wed Jul 1 12:44:23 2009 (r195236) @@ -1656,13 +1656,12 @@ nfsrv_create(struct nfsrv_descript *nfsd caddr_t bpos; int error = 0, rdev, len, tsize, dirfor_ret = 1, diraft_ret = 1; int v3 = (nfsd->nd_flag & ND_NFSV3), how, exclusive_flag = 0; - caddr_t cp; struct mbuf *mb, *mreq; struct vnode *dirp = NULL; nfsfh_t nfh; fhandle_t *fhp; u_quad_t tempsize; - u_char cverf[NFSX_V3CREATEVERF]; + struct timespec cverf; struct mount *mp = NULL; int tvfslocked; int vfslocked; @@ -1741,8 +1740,11 @@ nfsrv_create(struct nfsrv_descript *nfsd nfsm_srvsattr(vap); break; case NFSV3CREATE_EXCLUSIVE: - cp = nfsm_dissect_nonblock(caddr_t, NFSX_V3CREATEVERF); - bcopy(cp, cverf, NFSX_V3CREATEVERF); + tl = nfsm_dissect_nonblock(u_int32_t *, + NFSX_V3CREATEVERF); + /* Unique bytes, endianness is not important. */ + cverf.tv_sec = tl[0]; + cverf.tv_nsec = tl[1]; exclusive_flag = 1; break; }; @@ -1788,8 +1790,7 @@ nfsrv_create(struct nfsrv_descript *nfsd if (exclusive_flag) { exclusive_flag = 0; VATTR_NULL(vap); - bcopy(cverf, (caddr_t)&vap->va_atime, - NFSX_V3CREATEVERF); + vap->va_atime = cverf; error = VOP_SETATTR(nd.ni_vp, vap, cred, td); } @@ -1873,7 +1874,7 @@ nfsrv_create(struct nfsrv_descript *nfsd } if (v3) { if (exclusive_flag && !error && - bcmp(cverf, (caddr_t)&vap->va_atime, NFSX_V3CREATEVERF)) + bcmp(&cverf, &vap->va_atime, sizeof (cverf))) error = EEXIST; if (dirp == nd.ni_dvp) diraft_ret = VOP_GETATTR(dirp, &diraft, cred, td); _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Jul 1 17:27:24 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6578710656B3; Wed, 1 Jul 2009 17:27:24 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id DA9D88FC1B; Wed, 1 Jul 2009 17:27:23 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAAY5S0qDaFvG/2dsb2JhbADRF4QRBQ X-IronPort-AV: E=Sophos;i="4.42,326,1243828800"; d="scan'208";a="39941217" Received: from amazon.cs.uoguelph.ca ([131.104.91.198]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 Jul 2009 13:27:23 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 0E772210159; Wed, 1 Jul 2009 13:27:23 -0400 (EDT) X-Virus-Scanned: amavisd-new at amazon.cs.uoguelph.ca Received: from amazon.cs.uoguelph.ca ([127.0.0.1]) by localhost (amazon.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7pNiV58pMoYq; Wed, 1 Jul 2009 13:27:22 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 3B1B42100E1; Wed, 1 Jul 2009 13:27:22 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n61HTiq18898; Wed, 1 Jul 2009 13:29:44 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 1 Jul 2009 13:29:44 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: "Julian H. Stacey" In-Reply-To: <200907010048.n610mPem058027@fire.js.berklix.net> Message-ID: References: <200907010048.n610mPem058027@fire.js.berklix.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kirk McKusick , Attilio Rao , freebsd-current@freebsd.org, freebsd-fs@freebsd.org Subject: Re: umount -f implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2009 17:27:25 -0000 On Wed, 1 Jul 2009, Julian H. Stacey wrote: > Kirk McKusick wrote: >> forced unmounts. The gentle force (-f) and the brute force (-F) >> unmount. > > -F Would also be nice for devd.conf detach, for when people > forget & pull a USB stick without unmounting first. > Better a corrupt stick than a crashed OS. > All I'll add is, for the experimental nfs client, if both semantics are desired (and, imho they are), there will need to be separate flags to indicate whether or not to terminate RPCs in progress. So, it seems that there is interest in a separate "umount -F" to handle the case of failed storage (disk subsystem, NAS server down,...). Is there anyone who is opposed to my pursuing this after FreeBSD-CURRENT branches from 8? (I can do the experimental nfs client + some testing. Hopefully others can help with the generic VFS issues and other file systems.) rick, who obviously doesn't have as good a memory as Kirk's:-) ps: Unfortunately Solaris uses "-F" for something entirely different, so feel free to suggest other flag values if you think that is a concern. From owner-freebsd-fs@FreeBSD.ORG Thu Jul 2 10:41:41 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19B7E1065673; Thu, 2 Jul 2009 10:41:41 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E3BF38FC1E; Thu, 2 Jul 2009 10:41:40 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n62AfeVd089457; Thu, 2 Jul 2009 10:41:40 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n62AfegV089453; Thu, 2 Jul 2009 10:41:40 GMT (envelope-from linimon) Date: Thu, 2 Jul 2009 10:41:40 GMT Message-Id: <200907021041.n62AfegV089453@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/136218: [zfs] Exported ZFS pools can't be imported into (Open)Solaris X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jul 2009 10:41:41 -0000 Old Synopsis: Exported ZFS pools can't be imported into (Open)Solaris New Synopsis: [zfs] Exported ZFS pools can't be imported into (Open)Solaris Responsible-Changed-From-To: freebsd-amd64->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Jul 2 10:41:22 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=136218 From owner-freebsd-fs@FreeBSD.ORG Thu Jul 2 15:25:50 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B7EE1065670 for ; Thu, 2 Jul 2009 15:25:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E4BC18FC24 for ; Thu, 2 Jul 2009 15:25:49 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiQGAGduTEqDaFvJ/2dsb2JhbACObwHBXoQRBQ X-IronPort-AV: E=Sophos;i="4.42,335,1243828800"; d="scan'208";a="40022957" Received: from ganges.cs.uoguelph.ca ([131.104.91.201]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 02 Jul 2009 11:25:49 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 29B95FB808B for ; Thu, 2 Jul 2009 11:25:49 -0400 (EDT) X-Virus-Scanned: amavisd-new at ganges.cs.uoguelph.ca Received: from ganges.cs.uoguelph.ca ([127.0.0.1]) by localhost (ganges.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YnkYWJwUctGt for ; Thu, 2 Jul 2009 11:25:48 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by ganges.cs.uoguelph.ca (Postfix) with ESMTP id 30778FB8081 for ; Thu, 2 Jul 2009 11:25:48 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n62FSBR22578 for ; Thu, 2 Jul 2009 11:28:11 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Thu, 2 Jul 2009 11:28:11 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-fs@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: documenting setup of Kerberized NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jul 2009 15:25:50 -0000 I'm not even sure if this is the right place to ask the question, but... I'm wondering what would be an appropriate place to document how to set up/use Kerberized NFS? I'm far from a Kerberos wizard, but I have picked up some tricks that might be useful to others. At this point, I think it would be too informal for something like a man page. Maybe a wiki or similar? (Since wiki.freebsd.org is for developers and not users, it seems that isn't the right place.) Maybe I could start with just a brain dump posting to this list? (Basically anywhere that the search engines can find, would be a start.) Thanks in advance for any suggestions, rick From owner-freebsd-fs@FreeBSD.ORG Fri Jul 3 12:29:47 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58DEF10656C7 for ; Fri, 3 Jul 2009 12:29:47 +0000 (UTC) (envelope-from DS@praxisvermittlung24.de) Received: from mail.sw-sec.de (mail.sw-sec.de [212.204.60.86]) by mx1.freebsd.org (Postfix) with ESMTP id 8B76E8FC14 for ; Fri, 3 Jul 2009 12:29:46 +0000 (UTC) (envelope-from DS@praxisvermittlung24.de) Received: from [192.168.2.116] (p4FEBDD50.dip.t-dialin.net [79.235.221.80]) (authenticated bits=0) by mail.sw-sec.de (8.12.10/8.12.10) with ESMTP id n63CHUPX036684 for ; Fri, 3 Jul 2009 14:17:30 +0200 (CEST) (envelope-from DS@praxisvermittlung24.de) Message-ID: <4A4DF6D8.4040304@praxisvermittlung24.de> Date: Fri, 03 Jul 2009 14:17:28 +0200 From: Daniel Seuffert Organization: Seuffert & Waidmann User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20090703120023.7E40B1065713@hub.freebsd.org> In-Reply-To: <20090703120023.7E40B1065713@hub.freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Subject: documenting setup of Kerberized NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: DS@praxisvermittlung24.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jul 2009 12:29:50 -0000 > From: Rick Macklem > Subject: documenting setup of Kerberized NFS > To: freebsd-fs@freebsd.org > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > I'm not even sure if this is the right place to ask the question, but... > > I'm wondering what would be an appropriate place to document how to > set up/use Kerberized NFS? I'm far from a Kerberos wizard, but I have > picked up some tricks that might be useful to others. At this point, > I think it would be too informal for something like a man page. > Maybe a wiki or similar? (Since wiki.freebsd.org is for developers and > not users, it seems that isn't the right place.) > > Maybe I could start with just a brain dump posting to this list? > (Basically anywhere that the search engines can find, would be a start.) > > Thanks in advance for any suggestions, rick Hi Rick, I suggest using a public wiki and adding it to http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-nfs.html once it is considered mature enough. Maybe freebsd-doc would be a better place to ask for feedback. Best regards, Daniel From owner-freebsd-fs@FreeBSD.ORG Fri Jul 3 16:29:32 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B57C2106564A for ; Fri, 3 Jul 2009 16:29:32 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 48DE68FC14 for ; Fri, 3 Jul 2009 16:29:32 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjAHACPPTUqDaFvK/2dsb2JhbACOdQGvTwECj3CCWAEDgTYFgTo X-IronPort-AV: E=Sophos;i="4.42,343,1243828800"; d="scan'208";a="40136555" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 Jul 2009 12:29:30 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id DD857109C2AF for ; Fri, 3 Jul 2009 12:29:30 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JL5+ph-7F9gf for ; Fri, 3 Jul 2009 12:29:30 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 15954109C24A for ; Fri, 3 Jul 2009 12:29:30 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n63GVvM08947 for ; Fri, 3 Jul 2009 12:31:57 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Fri, 3 Jul 2009 12:31:57 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: freebsd-fs@freebsd.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Kerberized NFS doc (raw text before I get it on a wiki) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jul 2009 16:29:33 -0000 Ok, here's the raw text that I plan on putting on a wiki as a starting point for documenting how to use set up Kerberized NFS on FreeBSD8. Any comments w.r.t. it are welcome. I'll probably put it: http://code.google.com/p/macnfsv4 under the wiki tab in a day or so. Hopefully it is some use, rick --- first draft of doc --- Setting up Kerberized NFS (more correctly: NFS using RPCSEC_GSS authentication via the kerberos5 mechanism) on FreeBSD8/CURRENT: Before we get started, here are a few preliminary things you need to know: (Lets assume the default Realm is CIS.UOGUELPH.CA with systems called nfs-client.cis.uoguelph.ca and nfs-server.cis.uoguelph.ca, plus a user called ricktst that exists in the password database on all systems with uid == 502.) In the GSSAPI there are two kinds of principal names, User and Host_Based. A User principal name normally refers to a user and looks like: ricktst <-- which becomes ricktst@CIS.UOGUELPH.CA in KerberosV It normally has credentials in a credentials cache file (/tmp/krb5cc_502) created by kinit or during the user's login process. A Host_Based principal name also has the fully qualified host domain name in it (so it can only be used on that host) and looks like: nfs@nfs-server.cis.uoguelph.ca <-- which becomes nfs/nfs-server.cis.uoguelph.ca@CIS.UOGUELPH.CA in KerberosV and normally has credentials in a keytab file created on the KDC and transferred to the system via some secure means. (Since this principal will only work on the system called nfs-server.cis.uoguelph.ca, the damage done if the keytab file is compromised on nfs-server.cis.uoguelph.ca is is limited to that system, which has already been compromised anyhow.) These principals have the advantage that their credentials can always be refreshed until a new keytab entry is generated for the principal on the KDC. For RPCSEC_GSS, there are 3 types of service which, for KerberosV are: krb5 - Use KerberosV for user authentication, but only protect the RPC header from compromise. krb5i - Use KerberosV for user authentication, but also use excrypted checksums on the RPC data to protect against "man in the middle" attacks involving replacement of the RPC data. krb5p - Use KerberosV for user authentication, but also encrypt the RPC data, so that it isn't on the wire in clear text. Setup: 1 - The client and server system(s) will need to be running kernels built with the following in their kernel config(5) files: options KGSSAPI device crypto 2 - KerberosV will need to be configured on all the systems, with them all in the same default REALM (I believe cross-REALM authentication isn't going to work well, since the uid<->user-principal-name translations won't work out). This can be confirmed via kinit working ok on them. 3 - All systems will need to be running the gssd daemon. The daemon can be started at boot time by setting: gssd_enable="YES" in the /etc/rc.conf file. 4 - The server(s) will need a host based entry for "nfs" in their default keytab file (see above). 5 - The server(s) will need "-sec" options added to the lines in /etc/exports for the file systems being exported for Kerberized NFS access. If AUTH_SYS access is to be allowed as well, "sys" must be specified too. For example: /exp -sec=sys:krb5:krb5i:krb5p nfs-client.cis.uoguelph.ca /exp -sec=krb5i:krb5p would allow nfs-client.cis.uoguelph.ca to use any authentication, but other clients would be required to use krb5i or krb5p. For NFSv4, the "V4:" line(s) will have to have the "-sec" option added, as well. This controls what authentication methods can be used for the system related operations that do not have any associated file handle. It might look like: V4: / -sec=sys:krb5:krb5i:krb5p (See "man exports" and "man nfsv4" for more info.) 5 - In the client, for NFSv3, the mount can usually be done by "root" using AUTH_SYS, since that is what most mountd servers handle. However, accesses to the file system may require that the users have valid TGTs in their credential cache file. For example (on the client): # mount -t nfs -o nfsv3,sec=krb5 nfs-server.cis.uoguelph.ca:/exp /mnt - where "root" does not have a TGT Then "ricktst" can: % kinit - enter KerberosV password when prompted % cd /mnt - and use the file system until the TGT expired (Exactly what is allowed is determined by the NFS server, as above.) For NFSv4, the situation is somewhat different, since there is no mountd protocol and there are system operations related to Opens and Locks that need to keep working until the file system is dismounted. There are basically two ways to do this (with a third variant): (A) - A user with a valid TGT may do the mount. For this case, the file system should be dismounted before the TGT expires. To do this: # sysctl vfs.usermounts=1 <-- done by root - Then, as the user logged in with a valid TGT % mount -t nfs -o nfsv4,sec=krb5i nfs-server.cis.uoguelph.ca:/exp mydir - where "mydir" is owned by the user - This will generate a warning about the mount table not being updated, but that isn't a serious problem imho. There are open source programs like "krenew" that can be used to help ensure that the TGT doesn't expire before dismounting one of these mounts. This is the only variant supported by FreeBSD8 "out of the box". (The entire mount path must be allowed to use the specified authentication flavour for the mount to work. In particular, Netapp filers tend to only allow AUTH_SYS (aka "sys") for the root directory by default and this must be changed for the mount to work. The FreeBSD8 NFSv4 client does not know how to switch authentication flavours dynamically, based on the server returning NFS4ERR_WRONGSEC (err# 10016).) (B) - Use a host based principal name in the default keytab file on the client, to allow the mount to be done by "root". This mount should continue to work, since the credentials for the host based principal in the default keytab file can continue to be refreshed. To do this form of NFSv4 mount, the kernel on the client will have to be patched with ftp://ftp.cis.uoguelph.ca/pub/nfsv4/freebsd-rpcsec.patch (available anonymous ftpable). It will also need an entry like: nfs/nfs-client.cis.uoguelph.ca@CIS.UOGUELPH.CA in the client machine's default keytab file. (For the client, the first component doesn't need to be "nfs". For example, Solaris10 typically uses "root".) Then you must set the sysctl variable vfs.rpcsec.keytab_enctype to the numeric value for the encryption type used when creating the keytab entry on the KDC. (The patch is not in FreeBSD8 partly because it requires this pesky business of setting the encryption type. I haven't figured out a way to make it work without doing gss_krb5_set_allowable_enctypes() calls to every client side credential acquired.) The numeric values for the encryption types look like: #define ETYPE_DES_CBC_CRC 1 and can be found in sys/kgssapi/krb5/kcrypto.h. (This must be done before the gssd daemon is started on the system.) Then, the mount command looks like: # mount -t nfs -o nfsv4,sec=krb5,gssname=nfs nfs-server.cis.uoguelph.ca:/exp /mnt Now, the system operations will be performed using the host based credential, but other operations will be performed using credentials for the appropriate user, such as ricktst, via their TGT. (This is the style used by the Solaris10 and Linux clients.) (C) - Is a variant of (B), where the "allgssname" mount option is specified, which means that all accesses to the file system use the host based principal name in the default keytab file. This case may be useful when the client is actually some kind of system that runs batch processes, where the user's are not normally logged with valid TGTs when running them. The mount command would look like: # mount -t nfs -o nfsv4,sec=krb5i,gssname=root,allgssname nfs-server.cis.uoguelph.ca:/exp /mnt Here are some gotchas to be aware of: - The time of day clocks for all systems must be synchronized to within the clock skew specified in /etc/krb5.conf. - KerberosV principal names are case sensitive, although DNS names are not. The simplest way to avoid grief is to use all lower case characters in your DNS host names and all upper case characters for your KerberosV REALM name. - The host name resolver functions must return the fully qualified host name i.e. nfs-server.cis.uoguelph.ca and not nfs-server. If you are using /etc/hosts, put the fully qualified name first, like: 131.104.48.99 nfs-server.cis.uoguelph.ca nfs-server - What to do w.r.t. ticket encryption types is beyond my limited KerberosV expertise, however if you stick with des-cbc-crc initially and get that working, you can try others later. I have been told that some Netapp filers only handle des-cbc-crc. (When creating keytab file entries on the KDC, be careful to specify the chosen encryption type and then set that type as the first/only type for the "default-etypes" entry in /etc/krb5.conf.) Also, if you are using the host based principal patch, be sure to set the encryption type used for the keytab entry, as above. If using dec-cbc-crc: # sysctl vfs.rpcsec.keytab_enctype=1 Good luck with it, rick From owner-freebsd-fs@FreeBSD.ORG Sat Jul 4 03:00:19 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EDE41065670 for ; Sat, 4 Jul 2009 03:00:19 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 75C278FC08 for ; Sat, 4 Jul 2009 03:00:19 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n6430JfT068925 for ; Sat, 4 Jul 2009 03:00:19 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n6430JTf068924; Sat, 4 Jul 2009 03:00:19 GMT (envelope-from gnats) Date: Sat, 4 Jul 2009 03:00:19 GMT Message-Id: <200907040300.n6430JTf068924@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: "James E. Flemer" Cc: Subject: Re: kern/124621: [ext3] [patch] Cannot mount ext2fs partition X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "James E. Flemer" List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jul 2009 03:00:19 -0000 The following reply was made to PR kern/124621; it has been noted by GNATS. From: "James E. Flemer" To: paulf@free.fr, bug-followup@FreeBSD.org Cc: Subject: Re: kern/124621: [ext3] [patch] Cannot mount ext2fs partition Date: Fri, 3 Jul 2009 19:23:53 -0700 Environment: FreeBSD cage.local 7.1-STABLE FreeBSD 7.1-STABLE #1: Thu Jan 8 20:10:28 PST 2009 root@cage.local:/mnt/space/usr/obj.i386/mnt/space/usr/src/sys/CAGE7-SMP i386 Just another data point, the patch (http://pflog.net/~floyd/ext2fs.diff, md5:3dd5125eeb591e9c53930beb216d523e) fixes mounting a ext3 partition from an Ubuntu 9.4 install. Confirmed with tune2fs that the partition has 256-byte inode size. Tested by applying patch and rebuilding in /usr/src/sys/modules/ext2fs. From owner-freebsd-fs@FreeBSD.ORG Sat Jul 4 21:26:51 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 601B21065672; Sat, 4 Jul 2009 21:26:51 +0000 (UTC) (envelope-from jilles@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 353C48FC13; Sat, 4 Jul 2009 21:26:51 +0000 (UTC) (envelope-from jilles@FreeBSD.org) Received: from freefall.freebsd.org (jilles@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n64LQpU9025298; Sat, 4 Jul 2009 21:26:51 GMT (envelope-from jilles@freefall.freebsd.org) Received: (from jilles@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n64LQoBC025291; Sat, 4 Jul 2009 21:26:50 GMT (envelope-from jilles) Date: Sat, 4 Jul 2009 21:26:50 GMT Message-Id: <200907042126.n64LQoBC025291@freefall.freebsd.org> To: danny@cs.huji.ac.il, jilles@FreeBSD.org, freebsd-fs@FreeBSD.org From: jilles@FreeBSD.org Cc: Subject: Re: kern/135412: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jul 2009 21:26:51 -0000 Synopsis: [zfs] [nfs] zfs(v13)+nfs and open(..., O_WRONLY|O_CREAT|O_EXCL, ...) returns io error State-Changed-From-To: open->closed State-Changed-By: jilles State-Changed-When: Sat Jul 4 21:26:50 UTC 2009 State-Changed-Why: Fix committed, no other branches affected. http://www.freebsd.org/cgi/query-pr.cgi?pr=135412 From owner-freebsd-fs@FreeBSD.ORG Sat Jul 4 21:56:22 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FD621065674 for ; Sat, 4 Jul 2009 21:56:22 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 099558FC15 for ; Sat, 4 Jul 2009 21:56:21 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: by an-out-0708.google.com with SMTP id d14so1393494and.13 for ; Sat, 04 Jul 2009 14:56:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=+ZDIe9ZYrbz1wmjV5NDo85U8VnEdA8/pwZd/LKviblQ=; b=G8EUsFeTHEU1fuyo7gtmHhPiW6EeMJt6GNZb0JvienynT2NQC+EVcfAbgf7Lz7d0qu c/rkHGMbliw+4ii/i2FvbYp9mBew0q/WA9jiI7B59Hd7k6/dHrmk6b5639fQrrShTNhi 3m/1zUY5bz1Xcxj8hjnmfsHGSTwzJRr579PTQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=xKI7il0c54CmIJN7IvqX+ldcQMvvYKKcORkDK4oaSxWI2BDy42sxpsqGZDCDO4kU7Z LZjphZgWrUpn0LCsXhHOd6tgd1Aezy9ycNdbnA3LAueBma5OJQsh+xSXMMDaIUwwiVS9 1T4yezWA9FUjzqTEiolgTa1JD/s3tUJhMl0js= MIME-Version: 1.0 Received: by 10.100.108.8 with SMTP id g8mr5111771anc.66.1246744581459; Sat, 04 Jul 2009 14:56:21 -0700 (PDT) In-Reply-To: References: Date: Sun, 5 Jul 2009 00:56:21 +0300 Message-ID: From: Dan Naumov To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: ZFS and df weirdness X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jul 2009 21:56:22 -0000 Hello list. I have a single 2tb disk used on a 7.2-release/amd64 system with a small part of it given to UFS and most of the disk given to a single "simple" zfs pool with several filesystems without redundancy. I've noticed a really weird thing regarding what "df" reports regarding the "total space" of one of my filesystems: atom# zpool list NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0SIZE =A0 =A0USED =A0 AVAIL =A0 = =A0CAP =A0HEALTH =A0 =A0 ALTROOT tank =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.80T =A0 =A0294G =A0 1.51T =A0 = =A015% =A0ONLINE =A0 =A0 - atom# zfs list NAME =A0 =A0 =A0 =A0 =A0 =A0 =A0USED =A0AVAIL =A0REFER =A0MOUNTPOINT tank =A0 =A0 =A0 =A0 =A0 =A0 =A0294G =A01.48T =A0 =A018K =A0none tank/DATA =A0 =A0 =A0 =A0 292G =A01.48T =A0 292G =A0/DATA tank/home =A0 =A0 =A0 =A0 216K =A01.48T =A0 =A021K =A0/home tank/home/jago =A0 =A0132K =A01.48T =A0 132K =A0/home/jago tank/home/karni =A0 =A062K =A01.48T =A0 =A062K =A0/home/karni tank/usr =A0 =A0 =A0 =A0 1.33G =A01.48T =A0 =A018K =A0none tank/usr/local =A0 =A0455M =A01.48T =A0 455M =A0/usr/local tank/usr/obj =A0 =A0 =A0 18K =A01.48T =A0 =A018K =A0/usr/obj tank/usr/ports =A0 =A0412M =A01.48T =A0 412M =A0/usr/ports tank/usr/src =A0 =A0 =A0495M =A01.48T =A0 495M =A0/usr/src tank/var =A0 =A0 =A0 =A0 =A0320K =A01.48T =A0 =A018K =A0none tank/var/log =A0 =A0 =A0302K =A01.48T =A0 302K =A0/var/log atom# df Filesystem =A0 =A0 =A01K-blocks =A0 =A0 =A0 Used =A0 =A0 =A0Avail Capacity = =A0Mounted on /dev/ad12s1a =A0 =A0 =A016244334 =A0 1032310 =A0 13912478 =A0 =A0 7% =A0 = =A0/ devfs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A01 =A0 =A0 =A0 =A0 1 =A0 =A0 = =A0 =A0 =A00 =A0 100% =A0 =A0/dev linprocfs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A04 =A0 =A0 =A0 =A0 4 =A0 =A0 =A0 = =A0 =A00 =A0 100% =A0 =A0/usr/compat/linux/proc tank/DATA =A0 =A0 =A0 1897835904 306397056 1591438848 =A0 =A016% =A0 =A0/DA= TA tank/home =A0 =A0 =A0 1591438848 =A0 =A0 =A0 =A0 0 1591438848 =A0 =A0 0% = =A0 =A0/home tank/home/jago =A01591438976 =A0 =A0 =A0 128 1591438848 =A0 =A0 0% =A0 =A0/= home/jago tank/home/karni 1591438848 =A0 =A0 =A0 =A0 0 1591438848 =A0 =A0 0% =A0 =A0/= home/karni tank/usr/local =A01591905024 =A0 =A0466176 1591438848 =A0 =A0 0% =A0 =A0/us= r/local tank/usr/obj =A0 =A01591438848 =A0 =A0 =A0 =A0 0 1591438848 =A0 =A0 0% =A0 = =A0/usr/obj tank/usr/ports =A01591860864 =A0 =A0422016 1591438848 =A0 =A0 0% =A0 =A0/us= r/ports tank/usr/src =A0 =A01591945600 =A0 =A0506752 1591438848 =A0 =A0 0% =A0 =A0/= usr/src tank/var/log =A0 =A01591439104 =A0 =A0 =A0 256 1591438848 =A0 =A0 0% =A0 = =A0/var/log atom# df -h Filesystem =A0 =A0 =A0 =A0 Size =A0 =A0Used =A0 Avail Capacity =A0Mounted o= n /dev/ad12s1a =A0 =A0 =A0 =A015G =A0 =A01.0G =A0 =A0 13G =A0 =A0 7% =A0 =A0/ devfs =A0 =A0 =A0 =A0 =A0 =A0 =A01.0K =A0 =A01.0K =A0 =A0 =A00B =A0 100% = =A0 =A0/dev linprocfs =A0 =A0 =A0 =A0 =A04.0K =A0 =A04.0K =A0 =A0 =A00B =A0 100% =A0 = =A0/usr/compat/linux/proc tank/DATA =A0 =A0 =A0 =A0 =A01.8T =A0 =A0292G =A0 =A01.5T =A0 =A016% =A0 = =A0/DATA tank/home =A0 =A0 =A0 =A0 =A01.5T =A0 =A0 =A00B =A0 =A01.5T =A0 =A0 0% =A0 = =A0/home tank/home/jago =A0 =A0 1.5T =A0 =A0128K =A0 =A01.5T =A0 =A0 0% =A0 =A0/home= /jago tank/home/karni =A0 =A01.5T =A0 =A0 =A00B =A0 =A01.5T =A0 =A0 0% =A0 =A0/ho= me/karni tank/usr/local =A0 =A0 1.5T =A0 =A0455M =A0 =A01.5T =A0 =A0 0% =A0 =A0/usr/= local tank/usr/obj =A0 =A0 =A0 1.5T =A0 =A0 =A00B =A0 =A01.5T =A0 =A0 0% =A0 =A0/= usr/obj tank/usr/ports =A0 =A0 1.5T =A0 =A0412M =A0 =A01.5T =A0 =A0 0% =A0 =A0/usr/= ports tank/usr/src =A0 =A0 =A0 1.5T =A0 =A0495M =A0 =A01.5T =A0 =A0 0% =A0 =A0/us= r/src tank/var/log =A0 =A0 =A0 1.5T =A0 =A0256K =A0 =A01.5T =A0 =A0 0% =A0 =A0/va= r/log Considering that every single filesystem is part of the exact same pool, with no custom options whatsoever used during filesystem creation (except for mountpoints), why is the size of tank/DATA 1.8T while the others are 1.5T? - Sincerely, Dan Naumov From owner-freebsd-fs@FreeBSD.ORG Sat Jul 4 22:12:24 2009 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2D7781065675; Sat, 4 Jul 2009 22:12:24 +0000 (UTC) (envelope-from jilles@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 031E68FC48; Sat, 4 Jul 2009 22:12:24 +0000 (UTC) (envelope-from jilles@FreeBSD.org) Received: from freefall.freebsd.org (jilles@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n64MCNBk062210; Sat, 4 Jul 2009 22:12:23 GMT (envelope-from jilles@freefall.freebsd.org) Received: (from jilles@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n64MCNx9062206; Sat, 4 Jul 2009 22:12:23 GMT (envelope-from jilles) Date: Sat, 4 Jul 2009 22:12:23 GMT Message-Id: <200907042212.n64MCNx9062206@freefall.freebsd.org> To: bjoern@cs.tu-berlin.de, jilles@FreeBSD.org, freebsd-fs@FreeBSD.org From: jilles@FreeBSD.org Cc: Subject: Re: kern/104133: [ext2fs] EXT2FS module corrupts EXT2/3 filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jul 2009 22:12:24 -0000 Synopsis: [ext2fs] EXT2FS module corrupts EXT2/3 filesystems State-Changed-From-To: feedback->open State-Changed-By: jilles State-Changed-When: Sat Jul 4 22:12:13 UTC 2009 State-Changed-Why: Requested feedback has been received. http://www.freebsd.org/cgi/query-pr.cgi?pr=104133