From owner-freebsd-fs@FreeBSD.ORG Sun Aug 14 10:27:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92830106566B for ; Sun, 14 Aug 2011 10:27:28 +0000 (UTC) (envelope-from quazi@bk.ru) Received: from fallback5.mail.ru (fallback5.mail.ru [94.100.176.59]) by mx1.freebsd.org (Postfix) with ESMTP id 277408FC0A for ; Sun, 14 Aug 2011 10:27:27 +0000 (UTC) Received: from smtp3.mail.ru (smtp3.mail.ru [94.100.176.131]) by fallback5.mail.ru (mPOP.Fallback_MX) with ESMTP id 4583E597C9F9 for ; Sun, 14 Aug 2011 14:11:50 +0400 (MSD) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mail.ru; s=mail; h=Content-Type:Subject:To:MIME-Version:From:Date:Message-ID; bh=r20huLYXuIN9578oJ3hEbYoRBu5XQWfYiqyObciweS4=; b=AMLXnpjpLBZOiP96D1dfxh1LXaC87PgwBec2z64HlGXGCP9SdiGJba4rbKIPnixz9DpvX3wbdDi/Z4YK08Q8hsrTCMzV9Ze7uY2GvtNaOvovV5HlZgIimQK6WqaDZr9u; Received: from [178.126.178.244] (port=43747 helo=QUAZIS.SNNLAN.local) by smtp3.mail.ru with asmtp id 1QsXfS-0005BI-00 for freebsd-fs@freebsd.org; Sun, 14 Aug 2011 14:11:42 +0400 Message-ID: <4E47A065.1070709@bk.ru> Date: Sun, 14 Aug 2011 13:16:05 +0300 From: Ruslan Yakovlev User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:5.0) Gecko/20110804 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org X-Spam: Not detected X-Mras: Ok Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFS: i/o error all block copies unavailable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2011 10:27:28 -0000 Hi all After power down on FreeBSD 8.2-STABLE #6 (now updated to #7, but problem standing) I can't boot from ZFS v28. gptzfsboot wrote boot: ZFS: i/o error all block copies unavailable instead boot: qroot:/boot/kernel/kernel I download FreeBSD 9.0-BETA1 image and boot from it. I can mount my ZFS storage. I copy /boot from ZFS storage to flash and now kernel booted from flash fine, after that ZFS storage mounted as / and all work. zpool scrub don't detect any problems. zpool status wrote "No known data errors". But it too slowly and I want normally boot from ZFS storage without loading kernel from flash. How can I fix "ZFS: i/o error all block copies unavailable" ? Now I have FreeBSD QUAZIS.SNNLAN.local 8.2-STABLE FreeBSD 8.2-STABLE #7: Fri Aug 12 23:27:33 EEST 2011 root@QUAZIS.SNNLAN.local:/usr/obj/usr/src/sys/main8 i386 => 34 976770988 ad4 GPT (465G) 34 256 1 freebsd-boot (128k) 290 16777216 2 freebsd-swap (8.0G) 16777506 959993516 3 freebsd-zfs (457G) From owner-freebsd-fs@FreeBSD.ORG Sun Aug 14 11:38:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2142C106564A for ; Sun, 14 Aug 2011 11:38:08 +0000 (UTC) (envelope-from rubyneko@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id A10118FC13 for ; Sun, 14 Aug 2011 11:38:07 +0000 (UTC) Received: by bkat8 with SMTP id t8so3255543bka.13 for ; Sun, 14 Aug 2011 04:38:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=0KD9wo8lbUAnOyK06acazdv6F6roAH6y50np5cCLSGI=; b=XZ2FtRu55tiuC5UuimR9pXDeCR7rQ1wkhKspTPXW7vaClY/N9zKGvBnNThE7h6J9b3 wxs8mNZWlaj80u5DlzemkNIqQ0pFZTxWThkTmXV/k/K36Hjmw5KVisEeH3rYK1rL+0Ay TVyu8V1L/jpAOuesOjIi5IbhYuH+vLUctaI8Q= MIME-Version: 1.0 Received: by 10.204.200.193 with SMTP id ex1mr550416bkb.39.1313320236023; Sun, 14 Aug 2011 04:10:36 -0700 (PDT) Received: by 10.204.102.7 with HTTP; Sun, 14 Aug 2011 04:10:35 -0700 (PDT) Date: Sun, 14 Aug 2011 14:10:35 +0300 Message-ID: From: rubyneko neko To: Ruslan Yakovlev Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: i/o error all block copies unavailable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2011 11:38:08 -0000 I have some problem too. Currently I'm working from kernel.old. gpart bootcode -p /boot/gptzfsboot -i 1 ad4 for my not work. any idea? On Sun, 2011-08-14 at 13:16 +0300, Ruslan Yakovlev wrote: > Hi all > After power down on FreeBSD 8.2-STABLE #6 (now updated to #7, but > problem standing) I can't boot from ZFS v28. > gptzfsboot wrote > boot: ZFS: i/o error all block copies unavailable > instead > boot: qroot:/boot/kernel/kernel > I download FreeBSD 9.0-BETA1 image and boot from it. I can mount my ZFS > storage. I copy /boot from ZFS storage to flash and now kernel booted > from flash fine, after that ZFS storage mounted as / and all work. zpool > scrub don't detect any problems. zpool status wrote "No known data errors". > But it too slowly and I want normally boot from ZFS storage without > loading kernel from flash. How can I fix "ZFS: i/o error all block > copies unavailable" ? > > Now I have > FreeBSD QUAZIS.SNNLAN.local 8.2-STABLE FreeBSD 8.2-STABLE #7: Fri Aug 12 > 23:27:33 EEST 2011 root@QUAZIS.SNNLAN.local:/usr/obj/usr/src/sys/main8 i386 > > => 34 976770988 ad4 GPT (465G) > > 34 256 1 freebsd-boot (128k) > > 290 16777216 2 freebsd-swap (8.0G) > > 16777506 959993516 3 freebsd-zfs (457G) > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Aug 14 12:23:35 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB4391065672 for ; Sun, 14 Aug 2011 12:23:35 +0000 (UTC) (envelope-from quazi@bk.ru) Received: from smtp13.mail.ru (smtp13.mail.ru [94.100.176.90]) by mx1.freebsd.org (Postfix) with ESMTP id 554008FC08 for ; Sun, 14 Aug 2011 12:23:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mail.ru; s=mail; h=Content-Type:In-Reply-To:References:Subject:To:MIME-Version:From:Date:Message-ID; bh=8oYgFd0dT0Jf1AULUqz0RHz7vKGh3cnwUVhMr9bst/U=; b=tmeVWDhK21wo7W1UKCvXBSyyAzeptdj7XrQj8X5F8rMe3CjzJAG2iAYlmWTYqBUvztj7+0o+laD7SXHZQXr8zINMjIQYKn/3AoO+Mw368vuI41IiGlGUzof/d15PrvJI; Received: from [178.126.178.244] (port=18963 helo=QUAZIS.SNNLAN.local) by smtp13.mail.ru with asmtp id 1QsZj2-00064s-00 for freebsd-fs@freebsd.org; Sun, 14 Aug 2011 16:23:33 +0400 Message-ID: <4E47BF5B.3010102@bk.ru> Date: Sun, 14 Aug 2011 15:28:11 +0300 From: Ruslan Yakovlev User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:5.0) Gecko/20110804 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: In-Reply-To: X-Spam: Not detected X-Mras: Ok Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS: i/o error all block copies unavailable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2011 12:23:36 -0000 I think it is not bootcode problem. I not modify my bootcode when power halted. It is some problems in ZFS. When I probe import ZFS pool (in 9.0-BETA1) it wrote that pool is busy. Only zpool import -f work. After that I change mountpoint, list my files, replace mountpoint to / and reboot. Now on boot it wrote many errors (many strings "ZFS: i/o error..") and wrote file names. First it /boot/kernel/kernel when I probe list files from bootcode, I can see only / and /boot, on /boot/kernel it wrote "ZFS: i/o error.." But now copy of /boot/kernel work fine from flash I do # copy -r /boot /boot.new # move /boot /boot.broken # move /boot.new /boot Now kernel boot, but stopped when probe mount ZFS storage as root. If I select boot string from menu and do #load /boot/kernel/zfs.ko it wrote "ZFS: i/o error.." pmbr and gptzfsboot from 9.0-BETA1 don't change anything. Problem staying. And I can't boot from kernel.old (it wrote "ZFS: i/o error.." too) I think if I copy all my files to other storage and rebuild ZFS pool, problem leave, but now I don't have any other storage for all my data. On 14.08.2011 14:10, rubyneko neko wrote: > I have some problem too. > Currently I'm working from kernel.old. > > gpart bootcode -p /boot/gptzfsboot -i 1 ad4 > for my not work. > > any idea? > > On Sun, 2011-08-14 at 13:16 +0300, Ruslan Yakovlev wrote: > > Hi all > > After power down on FreeBSD 8.2-STABLE #6 (now updated to #7, but > > problem standing) I can't boot from ZFS v28. > > gptzfsboot wrote > > boot: ZFS: i/o error all block copies unavailable > > instead > > boot: qroot:/boot/kernel/kernel > > I download FreeBSD 9.0-BETA1 image and boot from it. I can mount my ZFS > > storage. I copy /boot from ZFS storage to flash and now kernel booted > > from flash fine, after that ZFS storage mounted as / and all work. > zpool > > scrub don't detect any problems. zpool status wrote "No known data > errors". > > But it too slowly and I want normally boot from ZFS storage without > > loading kernel from flash. How can I fix "ZFS: i/o error all block > > copies unavailable" ? > > > > Now I have > > FreeBSD QUAZIS.SNNLAN.local 8.2-STABLE FreeBSD 8.2-STABLE #7: Fri > Aug 12 > > 23:27:33 EEST 2011 > root@QUAZIS.SNNLAN.local:/usr/obj/usr/src/sys/main8 i386 > > > > => 34 976770988 ad4 GPT (465G) > > > > 34 256 1 freebsd-boot (128k) > > > > 290 16777216 2 freebsd-swap (8.0G) > > > > 16777506 959993516 3 freebsd-zfs (457G) > > > > > > > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org > " > > From owner-freebsd-fs@FreeBSD.ORG Sun Aug 14 12:54:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E15F8106564A for ; Sun, 14 Aug 2011 12:54:03 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from fep34.mx.upcmail.net (fep34.mx.upcmail.net [62.179.121.52]) by mx1.freebsd.org (Postfix) with ESMTP id 66F778FC0A for ; Sun, 14 Aug 2011 12:54:03 +0000 (UTC) Received: from edge02.upcmail.net ([192.168.13.237]) by viefep11-int.chello.at (InterMail vM.8.01.02.02 201-2260-120-106-20100312) with ESMTP id <20110814122600.ELYY1647.viefep11-int.chello.at@edge02.upcmail.net>; Sun, 14 Aug 2011 14:26:00 +0200 Received: from pinky ([95.96.138.26]) by edge02.upcmail.net with edge id LCRy1h0090aMTqv02CRzvh; Sun, 14 Aug 2011 14:26:00 +0200 X-SourceIP: 95.96.138.26 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org, "Ruslan Yakovlev" References: <4E47BF5B.3010102@bk.ru> Date: Sun, 14 Aug 2011 14:26:03 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <4E47BF5B.3010102@bk.ru> User-Agent: Opera Mail/11.50 (Win32) X-Cloudmark-Analysis: v=1.1 cv=8aHJgfg0GQPVAsFhHUWrXuSEk7IPywT3HfAl6KezIcg= c=1 sm=0 a=jSLzLkXI7GEA:10 a=eO4J7RWVLuUA:10 a=bgpUlknNv7MA:10 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=ECAJpNFI83C1cmKNUt0A:9 a=6NtR6f4NIeGg6he39AYA:7 a=CjuIK1q_8ugA:10 a=SV7veod9ZcQA:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 Cc: Subject: Re: ZFS: i/o error all block copies unavailable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2011 12:54:04 -0000 Is /boot/zfs/zpool.cache correct for the current setup? Ronald. On Sun, 14 Aug 2011 14:28:11 +0200, Ruslan Yakovlev wrote: > I think it is not bootcode problem. I not modify my bootcode when power > halted. It is some problems in ZFS. > When I probe import ZFS pool (in 9.0-BETA1) it wrote that pool is busy. > Only zpool import -f work. > After that I change mountpoint, list my files, replace mountpoint to / > and reboot. Now on boot it wrote many errors (many strings "ZFS: i/o > error..") and wrote file names. > First it /boot/kernel/kernel > when I probe list files from bootcode, I can see only / and /boot, on > /boot/kernel it wrote "ZFS: i/o error.." > But now copy of /boot/kernel work fine from flash > I do > # copy -r /boot /boot.new > # move /boot /boot.broken > # move /boot.new /boot > Now kernel boot, but stopped when probe mount ZFS storage as root. If I > select boot string from menu and do #load /boot/kernel/zfs.ko it wrote > "ZFS: i/o error.." > pmbr and gptzfsboot from 9.0-BETA1 don't change anything. Problem > staying. > And I can't boot from kernel.old (it wrote "ZFS: i/o error.." too) > > I think if I copy all my files to other storage and rebuild ZFS pool, > problem leave, but now I don't have any other storage for all my data. > > On 14.08.2011 14:10, rubyneko neko wrote: >> I have some problem too. >> Currently I'm working from kernel.old. >> >> gpart bootcode -p /boot/gptzfsboot -i 1 ad4 >> for my not work. >> >> any idea? >> >> On Sun, 2011-08-14 at 13:16 +0300, Ruslan Yakovlev wrote: >> > Hi all >> > After power down on FreeBSD 8.2-STABLE #6 (now updated to #7, but >> > problem standing) I can't boot from ZFS v28. >> > gptzfsboot wrote >> > boot: ZFS: i/o error all block copies unavailable >> > instead >> > boot: qroot:/boot/kernel/kernel >> > I download FreeBSD 9.0-BETA1 image and boot from it. I can mount my >> ZFS >> > storage. I copy /boot from ZFS storage to flash and now kernel booted >> > from flash fine, after that ZFS storage mounted as / and all work. >> zpool >> > scrub don't detect any problems. zpool status wrote "No known data >> errors". >> > But it too slowly and I want normally boot from ZFS storage without >> > loading kernel from flash. How can I fix "ZFS: i/o error all block >> > copies unavailable" ? >> > >> > Now I have >> > FreeBSD QUAZIS.SNNLAN.local 8.2-STABLE FreeBSD 8.2-STABLE #7: Fri Aug >> 12 >> > 23:27:33 EEST 2011 >> root@QUAZIS.SNNLAN.local:/usr/obj/usr/src/sys/main8 i386 >> > >> > => 34 976770988 ad4 GPT (465G) >> > >> > 34 256 1 freebsd-boot (128k) >> > >> > 290 16777216 2 freebsd-swap (8.0G) >> > >> > 16777506 959993516 3 freebsd-zfs (457G) >> > >> > >> > >> > _______________________________________________ >> > freebsd-fs@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org >> " >> >> > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Aug 15 11:07:01 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A6521065675 for ; Mon, 15 Aug 2011 11:07:01 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6F59F8FC18 for ; Mon, 15 Aug 2011 11:07:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7FB71vl014720 for ; Mon, 15 Aug 2011 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7FB707I014718 for freebsd-fs@FreeBSD.org; Mon, 15 Aug 2011 11:07:00 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 15 Aug 2011 11:07:00 GMT Message-Id: <201108151107.p7FB707I014718@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 11:07:01 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/159418 fs [tmpfs] [panic] tmpfs kernel panic: recursing on non r o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159233 fs [ext2fs] [patch] fs/ext2fs: finish reallocblk implemen o kern/159232 fs [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs [amd] amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157728 fs [zfs] zfs (v28) incremental receive may leave behind t o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156933 fs [zfs] ZFS receive after read on readonly=on filesystem o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs f kern/130133 fs [panic] [zfs] 'kmem_map too small' caused by make clea o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi f kern/126703 fs [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/120210 fs [zfs] [panic] reboot after panic: solaris assert: arc_ o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 244 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Aug 15 17:46:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82151106566C for ; Mon, 15 Aug 2011 17:46:18 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5691D8FC16 for ; Mon, 15 Aug 2011 17:46:18 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id F34AC46B32; Mon, 15 Aug 2011 13:46:17 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7EA0F8A02F; Mon, 15 Aug 2011 13:46:17 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Mon, 15 Aug 2011 13:43:14 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <1687823014.1491995.1312757266327.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1687823014.1491995.1312757266327.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201108151343.14655.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 15 Aug 2011 13:46:17 -0400 (EDT) Cc: onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 17:46:18 -0000 On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > A recent PR (kern/159351) noted that the following > calculation results in a divide-by-zero when > desiredvnodes < 1000. > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > Just fixing the divide-by-zero is easy enough, but I'm not > sure what this calculation is trying to do. Making it a fraction > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > bytes of uncommitted data in the NFS client's buffer cache blocks, > if I understand it correctly), but why divide it by > > (desiredvnodes / 1000) ?? > > Maybe thinking that fewer vnodes means sharing it with fewer > other file systems or ??? > > Anyhow, it seems to me that the formulae is bogus for small > values of desiredvnodes (for example desiredvnodes == 1500 > implies nm_wcommitsize == hibufspace, which sounds too large > to me). > > I'm thinking that putting an upper limit of 10% of hibufspace > might make sense. ie. Change the above to: > > if (desiredvnodes >= 11000) > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > else > nmp->nm_wcommitsize = hibufspace / 10; > > Anyone have comments or insight into this calculation? > > rick > ps: jhb, I hope you don't mind. I emailed you first and then > thought others might have some ideas, too. Oh no, this is fine. A broader discussion is probably warranted. I honestly don't know what the goal is. I do think it is an attempt to share with other file systems, but I'm not sure how desiredvnodes / 1000 is useful for that. It also seems that we can end up setting this woefully low as well. That is, I wonder if we need a minimum of 10% of hibufspace so that it can scale between 10% and 90% of hibufspace (but I'm not sure what you would use to pick the scaling factor sanely). To my mind what you really want to do is something like 'hibufspace / (number of active mounts)', but that will not really work correctly unless we recalculate the value on each mount and unmount operation. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Aug 15 22:58:15 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C57281065678; Mon, 15 Aug 2011 22:58:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6DBAC8FC1A; Mon, 15 Aug 2011 22:58:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsMAADqjSU6DaFvO/2dsb2JhbABBhEiUAJBFgUABAQUjBFIbDgoCAg0ZAlkGLrEgkVuBLIQLgRAEkxKREQ X-IronPort-AV: E=Sophos;i="4.67,376,1309752000"; d="scan'208";a="134489408" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 15 Aug 2011 18:58:14 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 85B90B4010; Mon, 15 Aug 2011 18:58:14 -0400 (EDT) Date: Mon, 15 Aug 2011 18:58:14 -0400 (EDT) From: Rick Macklem To: John Baldwin Message-ID: <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201108151343.14655.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2011 22:58:16 -0000 John Baldwin wrote: > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > A recent PR (kern/159351) noted that the following > > calculation results in a divide-by-zero when > > desiredvnodes < 1000. > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > sure what this calculation is trying to do. Making it a fraction > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > > bytes of uncommitted data in the NFS client's buffer cache blocks, > > if I understand it correctly), but why divide it by > > > > (desiredvnodes / 1000) ?? > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > other file systems or ??? > > > > Anyhow, it seems to me that the formulae is bogus for small > > values of desiredvnodes (for example desiredvnodes == 1500 > > implies nm_wcommitsize == hibufspace, which sounds too large > > to me). > > > > I'm thinking that putting an upper limit of 10% of hibufspace > > might make sense. ie. Change the above to: > > > > if (desiredvnodes >= 11000) > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > else > > nmp->nm_wcommitsize = hibufspace / 10; > > > > Anyone have comments or insight into this calculation? > > > > rick > > ps: jhb, I hope you don't mind. I emailed you first and then > > thought others might have some ideas, too. > > Oh no, this is fine. A broader discussion is probably warranted. I > honestly > don't know what the goal is. I do think it is an attempt to share with > other > file systems, but I'm not sure how desiredvnodes / 1000 is useful for > that. > It also seems that we can end up setting this woefully low as well. > That is, > I wonder if we need a minimum of 10% of hibufspace so that it can > scale > between 10% and 90% of hibufspace (but I'm not sure what you would use > to > pick the scaling factor sanely). To my mind what you really want to do > is > something like 'hibufspace / (number of active mounts)', but that will > not > really work correctly unless we recalculate the value on each mount > and > unmount operation. > > -- > John Baldwin Btw, this was done by r147280 6.5years ago, so the formula doesn't seem to be causing a lot of grief. Also of some interest is the fact that wcommitsize appears to have been setable on a per-mount-point-basis until mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. } Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects how much write behind happens. This, in turn, affects how bursty (is this a real word? hopefully you get what I mean?) the write traffic to the server is. What I'm not sure about is what happens when multiple mounts use up the entire buffer cache with write behinds. I'll try a little experiment to see if I can find that out. (If making it large isn't detrimental, then I tend to agree that the above sets nm_wcommitsize very small.) Since "desiredvnodes" will seldom be less than 1000, I'm not going to rush to a solution. Anyone who has insight into what this formula should be, please let us know. rick From owner-freebsd-fs@FreeBSD.ORG Tue Aug 16 02:26:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25EDB106566C for ; Tue, 16 Aug 2011 02:26:00 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id C2DC18FC0A for ; Tue, 16 Aug 2011 02:25:58 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta02.westchester.pa.mail.comcast.net with comcast id Lpjz1h0031vXlb852qRztJ; Tue, 16 Aug 2011 02:25:59 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.westchester.pa.mail.comcast.net with comcast id LqRw1h00c1t3BNj3dqRxQs; Tue, 16 Aug 2011 02:25:58 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id E60D2102C1A; Mon, 15 Aug 2011 19:25:54 -0700 (PDT) Date: Mon, 15 Aug 2011 19:25:54 -0700 From: Jeremy Chadwick To: Rick Macklem Message-ID: <20110816022554.GA6018@icarus.home.lan> References: <201108151343.14655.jhb@freebsd.org> <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 02:26:00 -0000 On Mon, Aug 15, 2011 at 06:58:14PM -0400, Rick Macklem wrote: > John Baldwin wrote: > > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > > A recent PR (kern/159351) noted that the following > > > calculation results in a divide-by-zero when > > > desiredvnodes < 1000. > > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > > sure what this calculation is trying to do. Making it a fraction > > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > > > bytes of uncommitted data in the NFS client's buffer cache blocks, > > > if I understand it correctly), but why divide it by > > > > > > (desiredvnodes / 1000) ?? > > > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > > other file systems or ??? > > > > > > Anyhow, it seems to me that the formulae is bogus for small > > > values of desiredvnodes (for example desiredvnodes == 1500 > > > implies nm_wcommitsize == hibufspace, which sounds too large > > > to me). > > > > > > I'm thinking that putting an upper limit of 10% of hibufspace > > > might make sense. ie. Change the above to: > > > > > > if (desiredvnodes >= 11000) > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > else > > > nmp->nm_wcommitsize = hibufspace / 10; > > > > > > Anyone have comments or insight into this calculation? > > > > > > rick > > > ps: jhb, I hope you don't mind. I emailed you first and then > > > thought others might have some ideas, too. > > > > Oh no, this is fine. A broader discussion is probably warranted. I > > honestly > > don't know what the goal is. I do think it is an attempt to share with > > other > > file systems, but I'm not sure how desiredvnodes / 1000 is useful for > > that. > > It also seems that we can end up setting this woefully low as well. > > That is, > > I wonder if we need a minimum of 10% of hibufspace so that it can > > scale > > between 10% and 90% of hibufspace (but I'm not sure what you would use > > to > > pick the scaling factor sanely). To my mind what you really want to do > > is > > something like 'hibufspace / (number of active mounts)', but that will > > not > > really work correctly unless we recalculate the value on each mount > > and > > unmount operation. > > > > -- > > John Baldwin > Btw, this was done by r147280 6.5years ago, so the formula doesn't seem > to be causing a lot of grief. Also of some interest is the fact that > wcommitsize appears to have been setable on a per-mount-point-basis until > mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. } > > Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects > how much write behind happens. This, in turn, affects how bursty (is this a real > word? hopefully you get what I mean?) the write traffic to the server is. > > What I'm not sure about is what happens when multiple mounts use up the entire > buffer cache with write behinds. I'll try a little experiment to see if I > can find that out. (If making it large isn't detrimental, then I tend to > agree that the above sets nm_wcommitsize very small.) > > Since "desiredvnodes" will seldom be less than 1000, I'm not going to > rush to a solution. > > Anyone who has insight into what this formula should be, please let us know. The commit message tries to explain it, but it's more than just a one-line change. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsclient/nfs_vfsops.c#rev1.177 There's also an associated PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=79208 -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Aug 16 04:05:20 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26E25106566C; Tue, 16 Aug 2011 04:05:20 +0000 (UTC) (envelope-from jwd@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F240A8FC12; Tue, 16 Aug 2011 04:05:19 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7G45J7v058575; Tue, 16 Aug 2011 04:05:19 GMT (envelope-from jwd@freefall.freebsd.org) Received: (from jwd@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7G45JxW058574; Tue, 16 Aug 2011 04:05:19 GMT (envelope-from jwd) Date: Tue, 16 Aug 2011 04:05:19 +0000 From: John To: freebsd-fs@freebsd.org Message-ID: <20110816040519.GA49864@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: freebsd-current@freebsd.org Subject: Three LOR with latest -current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 04:05:20 -0000 Hi folks, I'm seeing 3 lock order reversals with an up-to-date -current system. Stock system, GENERIC kernel. Let me know if this isn't enough information. Just booting the system and the dmesg. Thanks, John lock order reversal: 1st 0xfffffe0289627db8 ufs (ufs) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:425 2nd 0xffffff9f0db49778 bufwait (bufwait) @ /usr/src.2011-08-14_10.53pm_EDT/sys/kern/vfs_bio.c:2658 3rd 0xfffffe00404a8098 ufs (ufs) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:546 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 ffs_snapshot() at ffs_snapshot+0x1c31 ffs_mount() at ffs_mount+0xa24 vfs_donmount() at vfs_donmount+0xddc nmount() at nmount+0x63 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (378, FreeBSD ELF64, nmount), rip = 0x800abde1c, rsp = 0x7fffffffd968, rbp = 0x801008130 --- lock order reversal: 1st 0xffffff9f0db49778 bufwait (bufwait) @ /usr/src.2011-08-14_10.53pm_EDT/sys/kern/vfs_bio.c:2658 2nd 0xfffffe004034dcb0 snaplk (snaplk) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:818 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 ffs_snapshot() at ffs_snapshot+0x1b0c ffs_mount() at ffs_mount+0xa24 vfs_donmount() at vfs_donmount+0xddc nmount() at nmount+0x63 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (378, FreeBSD ELF64, nmount), rip = 0x800abde1c, rsp = 0x7fffffffd968, rbp = 0x801008130 --- lock order reversal: 1st 0xfffffe004034dcb0 snaplk (snaplk) @ /usr/src.2011-08-14_10.53pm_EDT/sys/kern/vfs_vnops.c:301 2nd 0xfffffe0289627db8 ufs (ufs) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:1620 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_snapremove() at ffs_snapremove+0xe7 ffs_truncate() at ffs_truncate+0x302 ufs_inactive() at ufs_inactive+0x260 vinactive() at vinactive+0x72 vputx() at vputx+0x386 vn_close() at vn_close+0x118 vn_closefile() at vn_closefile+0x5a _fdrop() at _fdrop+0x23 closef() at closef+0x5c kern_close() at kern_close+0x121 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (6, FreeBSD ELF64, close), rip = 0x800b5e2bc, rsp = 0x7fffffffd968, rbp = 0 --- From owner-freebsd-fs@FreeBSD.ORG Tue Aug 16 10:12:32 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8EDF106564A for ; Tue, 16 Aug 2011 10:12:31 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 670408FC19 for ; Tue, 16 Aug 2011 10:12:31 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1QtGdK-0001gW-8M; Tue, 16 Aug 2011 13:12:30 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id DF6711CC21; Tue, 16 Aug 2011 13:12:29 +0300 (EEST) Date: Tue, 16 Aug 2011 13:12:29 +0300 From: Andrey Simonenko To: Martin Birgmeier Message-ID: <20110816101229.GA2012@pm513-1.comsys.ntu-kpi.kiev.ua> References: <4E4657BD.2090803@aon.at> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E4657BD.2090803@aon.at> User-Agent: Mutt/1.5.21 (2010-09-15) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 10-Dec-2010 16:36:10) X-Date: 2011-08-16 13:12:30 X-Connected-IP: 10.18.52.101:58923 X-Message-Linecount: 96 X-Body-Linecount: 80 X-Message-Size: 3854 X-Body-Size: 3209 Cc: freebsd-fs@FreeBSD.org Subject: Re: Does nfse support specifying multiple exports for one mount point? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 10:12:32 -0000 On Sat, Aug 13, 2011 at 12:53:49PM +0200, Martin Birgmeier wrote: > See http://www.freebsd.org/cgi/query-pr.cgi?pr=147881 - can I specify > multiple exports with nfse? > > I am using the patch proposed in PR 147881, even though I believe it is > incomplete (I read that somewhere). For me, it is working fine; for > example, I have > > [0]# zfs list -o name,sharenfs hal.1/backup/dumps > NAME SHARENFS > hal.1/backup/dumps -network 192.168.0.0 -mask 255.255.0.0;-network > fec0:0:0:4d42::/56 > [0]# > > which in /etc/zfs/exports translates to > > /z/backup/dumps -network 192.168.0.0 -mask 255.255.0.0 > /z/backup/dumps -network fec0:0:0:4d42::/56 > > How can I specify this using nfse? > PR/147881 proposes a way how to specify different options for different address specifications in one line. Eg. different -mapall options for different hosts in one line. >From the nfs.exports(5) manual page: after any address specification it is possible to use already specified option in the same line and its value will overwrite previous option's value and it will be used for next address specification. Such options are: -mapall and -maproot, -ro and -rw, -sec. It is possible to create reverse logic options for -no_* and -mnt_export_brief options as well. The ``*'' hostname represents default export and can be used in a line with other address specification. Example: % cat exports /fs -ro -mapall 1:2:3 1.1.1.1 -sec krb5 -maproot 2:3:4 * 2.2.2.2 -rw 3.3.3.3 % nfse -t exports configure: reading file exports Pathname /fs Export specifications: -rw -sec krb5 -maproot 2:3:4 -host 3.3.3.3 -ro -sec krb5 -maproot 2:3:4 -host 2.2.2.2 -ro -sec sys -mapall 1:2:3 -host 1.1.1.1 -ro -sec krb5 -maproot 2:3:4 * The exports(5) manual page says that address specifications must be specified after options. The nfs.exports(5) file format allows to use options after address specifications, so they can overwrite previously specified options. If you applied cddl.diff patch, then you can use zfs share/unshare to change ZFS NFS exports, and of course they will be changed atomically and changes will be applied only for one file system in a time. As a result if one used zfs share/unshare for ZFS file system, then exports settings from other exports files for this file system will be flushed. The -alldirs options is also supported by the "zfs share" command, but its logic does not follow logic described in nfs.exports(5). If the -alldirs options is used then nfse will create two exports: "/fs ..." and "/fs -subdir -alldirs ...". This is because of logic how zfs share/unshare works: 1. "zfs sharenfs ..." are not incremental. When we run "zfs sharenfs" for some file system it completely substitutes its settings. 2. There is no way to pass several settings for one file system at least for mountd. 3. "zfs sharenfs" does not allow to export subdirectories. If you unsure about configuration logic for nfse, then just call "nfse -t" and verify its output. At any time run "nfse -c show" and verify current NFS exports settings. If you prefer to run nfse(8) in compatible mode with mountd(8), then run it with the -C switch. From owner-freebsd-fs@FreeBSD.ORG Tue Aug 16 13:50:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3E5C106564A for ; Tue, 16 Aug 2011 13:50:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B59B48FC0C for ; Tue, 16 Aug 2011 13:50:52 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 2E93F46B23; Tue, 16 Aug 2011 09:50:52 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C4D838A037; Tue, 16 Aug 2011 09:50:51 -0400 (EDT) From: John Baldwin To: Jeremy Chadwick Date: Tue, 16 Aug 2011 09:31:35 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110617; KDE/4.5.5; amd64; ; ) References: <201108151343.14655.jhb@freebsd.org> <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> <20110816022554.GA6018@icarus.home.lan> In-Reply-To: <20110816022554.GA6018@icarus.home.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201108160931.35626.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Tue, 16 Aug 2011 09:50:51 -0400 (EDT) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 13:50:53 -0000 On Monday, August 15, 2011 10:25:54 pm Jeremy Chadwick wrote: > On Mon, Aug 15, 2011 at 06:58:14PM -0400, Rick Macklem wrote: > > John Baldwin wrote: > > > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > > > A recent PR (kern/159351) noted that the following > > > > calculation results in a divide-by-zero when > > > > desiredvnodes < 1000. > > > > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > > > sure what this calculation is trying to do. Making it a fraction > > > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > > > > bytes of uncommitted data in the NFS client's buffer cache blocks, > > > > if I understand it correctly), but why divide it by > > > > > > > > (desiredvnodes / 1000) ?? > > > > > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > > > other file systems or ??? > > > > > > > > Anyhow, it seems to me that the formulae is bogus for small > > > > values of desiredvnodes (for example desiredvnodes == 1500 > > > > implies nm_wcommitsize == hibufspace, which sounds too large > > > > to me). > > > > > > > > I'm thinking that putting an upper limit of 10% of hibufspace > > > > might make sense. ie. Change the above to: > > > > > > > > if (desiredvnodes >= 11000) > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > else > > > > nmp->nm_wcommitsize = hibufspace / 10; > > > > > > > > Anyone have comments or insight into this calculation? > > > > > > > > rick > > > > ps: jhb, I hope you don't mind. I emailed you first and then > > > > thought others might have some ideas, too. > > > > > > Oh no, this is fine. A broader discussion is probably warranted. I > > > honestly > > > don't know what the goal is. I do think it is an attempt to share with > > > other > > > file systems, but I'm not sure how desiredvnodes / 1000 is useful for > > > that. > > > It also seems that we can end up setting this woefully low as well. > > > That is, > > > I wonder if we need a minimum of 10% of hibufspace so that it can > > > scale > > > between 10% and 90% of hibufspace (but I'm not sure what you would use > > > to > > > pick the scaling factor sanely). To my mind what you really want to do > > > is > > > something like 'hibufspace / (number of active mounts)', but that will > > > not > > > really work correctly unless we recalculate the value on each mount > > > and > > > unmount operation. > > > > > > -- > > > John Baldwin > > Btw, this was done by r147280 6.5years ago, so the formula doesn't seem > > to be causing a lot of grief. Also of some interest is the fact that > > wcommitsize appears to have been setable on a per-mount-point-basis until > > mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. } > > > > Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects > > how much write behind happens. This, in turn, affects how bursty (is this a real > > word? hopefully you get what I mean?) the write traffic to the server is. > > > > What I'm not sure about is what happens when multiple mounts use up the entire > > buffer cache with write behinds. I'll try a little experiment to see if I > > can find that out. (If making it large isn't detrimental, then I tend to > > agree that the above sets nm_wcommitsize very small.) > > > > Since "desiredvnodes" will seldom be less than 1000, I'm not going to > > rush to a solution. > > > > Anyone who has insight into what this formula should be, please let us know. > > The commit message tries to explain it, but it's more than just a > one-line change. > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsclient/nfs_vfsops.c#rev1.177 > > There's also an associated PR: > > http://www.freebsd.org/cgi/query-pr.cgi?pr=79208 The commit added the limit which is sensible, but it doesn't explain the logic for how the limit is computed (that is, why it uses desiredvnodes / 1000). -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Aug 16 16:22:24 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4AA1B106566C for ; Tue, 16 Aug 2011 16:22:24 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 136718FC18 for ; Tue, 16 Aug 2011 16:22:23 +0000 (UTC) Received: from carrick-users.bishnet.net ([2a01:348:132:51::10]) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1QtMPH-000DFv-CF for freebsd-fs@freebsd.org; Tue, 16 Aug 2011 17:22:23 +0100 Received: (from tdb@localhost) by carrick-users.bishnet.net (8.14.4/8.14.4/Submit) id p7GGMN8b050958 for freebsd-fs@freebsd.org; Tue, 16 Aug 2011 17:22:23 +0100 (BST) (envelope-from tdb) Date: Tue, 16 Aug 2011 17:22:23 +0100 From: Tim Bishop To: freebsd-fs@freebsd.org Message-ID: <20110816162223.GG7564@carrick-users.bishnet.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Did zpool offline, then reboot, now "freebsd zfs i/o error - all block copies unavailable" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 16:22:24 -0000 I suspected one of my disks (in a zpool mirror) had a problem, so I thought I'd test this out by offlining the disk and rebooting. Unfortunately this failed during the reboot with the following error: freebsd zfs i/o error - all block copies unavailable A search of the archive suggests people have hit this before, but not in the same circumstance (I think). I figured the pool being degraded might be the issue, so I booted from a livecd and detatched the offlined disk leaving the pool in a good state. Rebooted again but the same problem occured. I'm now back in the livecd reattaching the disk, but this'll take a long time (ETA 24h). Fingers crossed it works! Whilst I wait, does anybody have any ideas what went wrong? Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 From owner-freebsd-fs@FreeBSD.ORG Tue Aug 16 17:04:57 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70115106564A for ; Tue, 16 Aug 2011 17:04:57 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 36E318FC16 for ; Tue, 16 Aug 2011 17:04:57 +0000 (UTC) Received: from carrick-users.bishnet.net ([2a01:348:132:51::10]) by carrick.bishnet.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1QtN4S-000Gr9-LG for freebsd-fs@freebsd.org; Tue, 16 Aug 2011 18:04:56 +0100 Received: (from tdb@localhost) by carrick-users.bishnet.net (8.14.4/8.14.4/Submit) id p7GH4ucs064741 for freebsd-fs@freebsd.org; Tue, 16 Aug 2011 18:04:56 +0100 (BST) (envelope-from tdb) Date: Tue, 16 Aug 2011 18:04:56 +0100 From: Tim Bishop To: freebsd-fs@freebsd.org Message-ID: <20110816170456.GH7564@carrick-users.bishnet.net> References: <20110816162223.GG7564@carrick-users.bishnet.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110816162223.GG7564@carrick-users.bishnet.net> X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: Did zpool offline, then reboot, now "freebsd zfs i/o error - all block copies unavailable" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Aug 2011 17:04:57 -0000 On Tue, Aug 16, 2011 at 05:22:23PM +0100, Tim Bishop wrote: > I suspected one of my disks (in a zpool mirror) had a problem, so I > thought I'd test this out by offlining the disk and rebooting. > > Unfortunately this failed during the reboot with the following error: > > freebsd zfs i/o error - all block copies unavailable > > A search of the archive suggests people have hit this before, but not in > the same circumstance (I think). > > I figured the pool being degraded might be the issue, so I booted from a > livecd and detatched the offlined disk leaving the pool in a good state. > Rebooted again but the same problem occured. > > I'm now back in the livecd reattaching the disk, but this'll take a long > time (ETA 24h). Fingers crossed it works! > > Whilst I wait, does anybody have any ideas what went wrong? Doh - simple I think. The cache file didn't match the real state of the system. Copied the one from the livecd's /boot/zfs to the single disk pool and it now boots fine. Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984 From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 01:28:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAD96106566C for ; Wed, 17 Aug 2011 01:28:38 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id AFBF68FC1C for ; Wed, 17 Aug 2011 01:28:38 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta03.emeryville.ca.mail.comcast.net with comcast id MDDQ1h0070EPchoA3DUaVs; Wed, 17 Aug 2011 01:28:34 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.emeryville.ca.mail.comcast.net with comcast id MDU91h00G1t3BNj8MDUPmC; Wed, 17 Aug 2011 01:28:39 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 50382102C1A; Tue, 16 Aug 2011 18:28:06 -0700 (PDT) Date: Tue, 16 Aug 2011 18:28:06 -0700 From: Jeremy Chadwick To: John Baldwin Message-ID: <20110817012806.GA29555@icarus.home.lan> References: <201108151343.14655.jhb@freebsd.org> <1730399830.175988.1313449094531.JavaMail.root@erie.cs.uoguelph.ca> <20110816022554.GA6018@icarus.home.lan> <201108160931.35626.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201108160931.35626.jhb@freebsd.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 01:28:38 -0000 On Tue, Aug 16, 2011 at 09:31:35AM -0400, John Baldwin wrote: > On Monday, August 15, 2011 10:25:54 pm Jeremy Chadwick wrote: > > On Mon, Aug 15, 2011 at 06:58:14PM -0400, Rick Macklem wrote: > > > John Baldwin wrote: > > > > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > > > > A recent PR (kern/159351) noted that the following > > > > > calculation results in a divide-by-zero when > > > > > desiredvnodes < 1000. > > > > > > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > > > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > > > > sure what this calculation is trying to do. Making it a fraction > > > > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # of > > > > > bytes of uncommitted data in the NFS client's buffer cache blocks, > > > > > if I understand it correctly), but why divide it by > > > > > > > > > > (desiredvnodes / 1000) ?? > > > > > > > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > > > > other file systems or ??? > > > > > > > > > > Anyhow, it seems to me that the formulae is bogus for small > > > > > values of desiredvnodes (for example desiredvnodes == 1500 > > > > > implies nm_wcommitsize == hibufspace, which sounds too large > > > > > to me). > > > > > > > > > > I'm thinking that putting an upper limit of 10% of hibufspace > > > > > might make sense. ie. Change the above to: > > > > > > > > > > if (desiredvnodes >= 11000) > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > else > > > > > nmp->nm_wcommitsize = hibufspace / 10; > > > > > > > > > > Anyone have comments or insight into this calculation? > > > > > > > > > > rick > > > > > ps: jhb, I hope you don't mind. I emailed you first and then > > > > > thought others might have some ideas, too. > > > > > > > > Oh no, this is fine. A broader discussion is probably warranted. I > > > > honestly > > > > don't know what the goal is. I do think it is an attempt to share with > > > > other > > > > file systems, but I'm not sure how desiredvnodes / 1000 is useful for > > > > that. > > > > It also seems that we can end up setting this woefully low as well. > > > > That is, > > > > I wonder if we need a minimum of 10% of hibufspace so that it can > > > > scale > > > > between 10% and 90% of hibufspace (but I'm not sure what you would use > > > > to > > > > pick the scaling factor sanely). To my mind what you really want to do > > > > is > > > > something like 'hibufspace / (number of active mounts)', but that will > > > > not > > > > really work correctly unless we recalculate the value on each mount > > > > and > > > > unmount operation. > > > > > > > > -- > > > > John Baldwin > > > Btw, this was done by r147280 6.5years ago, so the formula doesn't seem > > > to be causing a lot of grief. Also of some interest is the fact that > > > wcommitsize appears to have been setable on a per-mount-point-basis until > > > mount_nfs(8) was converted to nmount(2). { There is no nmount option to set it. } > > > > > > Btw, when nm_wcommitsize is exceeded, writes become synchronous, so it affects > > > how much write behind happens. This, in turn, affects how bursty (is this a real > > > word? hopefully you get what I mean?) the write traffic to the server is. > > > > > > What I'm not sure about is what happens when multiple mounts use up the entire > > > buffer cache with write behinds. I'll try a little experiment to see if I > > > can find that out. (If making it large isn't detrimental, then I tend to > > > agree that the above sets nm_wcommitsize very small.) > > > > > > Since "desiredvnodes" will seldom be less than 1000, I'm not going to > > > rush to a solution. > > > > > > Anyone who has insight into what this formula should be, please let us know. > > > > The commit message tries to explain it, but it's more than just a > > one-line change. > > > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsclient/nfs_vfsops.c#rev1.177 > > > > There's also an associated PR: > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=79208 > > The commit added the limit which is sensible, but it doesn't explain the logic > for how the limit is computed (that is, why it uses desiredvnodes / 1000). Understood -- what I was getting at was that the individuals responsible for the commit (there were multiples who reviewed it) could be contacted and inquiries submit. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 09:32:01 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17C1B106564A for ; Wed, 17 Aug 2011 09:32:01 +0000 (UTC) (envelope-from prvs=1210f20b9f=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 8E2348FC08 for ; Wed, 17 Aug 2011 09:32:00 +0000 (UTC) X-MDAV-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 10:20:16 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Wed, 17 Aug 2011 10:20:16 +0100 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on mail1.multiplay.co.uk X-Spam-Level: X-Spam-Status: No, score=-5.0 required=6.0 tests=USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.2.5 Received: from r2d2 ([188.220.16.49]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50014632193.msg for ; Wed, 17 Aug 2011 10:20:14 +0100 X-MDRemoteIP: 188.220.16.49 X-Return-Path: prvs=1210f20b9f=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@FreeBSD.ORG Message-ID: From: "Steven Hartland" To: "Jeremy Chadwick" References: <20110728012437.GA23430@icarus.home.lan><20110728103234.GA33275@icarus.home.lan><20110728145917.GA37805@icarus.home.lan><2A07CD8AE6AE49A5BAED59A7E547D1F9@multiplay.co.uk><2D117F9F212A4CCBA6B7F51E8705BDB7@multiplay.co.uk><20110805033001.GA47366@icarus.home.lan><20110805044725.GA48395@icarus.home.lan><20110806041822.GA11439@icarus.home.lan> <42162705FC5E4E748A1A57285AECA49A@multiplay.co.uk> Date: Wed, 17 Aug 2011 10:20:51 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109 Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Questions about erasing an ssd to restore performance underFreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 09:32:01 -0000 ----- Original Message ----- From: "Steven Hartland" All our tests have now been successful so I've now submitted this patch as a PR, which I hope can be included in a future release, 9.0 maybe if its not too late :) http://www.freebsd.org/cgi/query-pr.cgi?pr=159833 Regards Steve ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 13:15:16 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1F7210656B3; Wed, 17 Aug 2011 13:15:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 53DDE8FC0C; Wed, 17 Aug 2011 13:15:16 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsAAADm+S06DaFvO/2dsb2JhbABChEiUSpBNgUABAQQBIwRSGwcHCgICDRkCWQYcEAKHVQSkSpFzgSyEDIEQBJMTkRE X-IronPort-AV: E=Sophos;i="4.68,240,1312171200"; d="scan'208";a="134665502" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 17 Aug 2011 09:15:15 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 48B8DB3F1E; Wed, 17 Aug 2011 09:15:15 -0400 (EDT) Date: Wed, 17 Aug 2011 09:15:15 -0400 (EDT) From: Rick Macklem To: Jeremy Chadwick Message-ID: <1313769356.247298.1313586915280.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110817012806.GA29555@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 13:15:17 -0000 Jeremy Chadwick wrote: > On Tue, Aug 16, 2011 at 09:31:35AM -0400, John Baldwin wrote: > > On Monday, August 15, 2011 10:25:54 pm Jeremy Chadwick wrote: > > > On Mon, Aug 15, 2011 at 06:58:14PM -0400, Rick Macklem wrote: > > > > John Baldwin wrote: > > > > > On Sunday, August 07, 2011 6:47:46 pm Rick Macklem wrote: > > > > > > A recent PR (kern/159351) noted that the following > > > > > > calculation results in a divide-by-zero when > > > > > > desiredvnodes < 1000. > > > > > > > > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > > > > > > > > Just fixing the divide-by-zero is easy enough, but I'm not > > > > > > sure what this calculation is trying to do. Making it a > > > > > > fraction > > > > > > of "hibufspace" makes sense (nm_wcommitsize is the maximum # > > > > > > of > > > > > > bytes of uncommitted data in the NFS client's buffer cache > > > > > > blocks, > > > > > > if I understand it correctly), but why divide it by > > > > > > > > > > > > (desiredvnodes / 1000) ?? > > > > > > > > > > > > Maybe thinking that fewer vnodes means sharing it with fewer > > > > > > other file systems or ??? > > > > > > > > > > > > Anyhow, it seems to me that the formulae is bogus for small > > > > > > values of desiredvnodes (for example desiredvnodes == 1500 > > > > > > implies nm_wcommitsize == hibufspace, which sounds too large > > > > > > to me). > > > > > > > > > > > > I'm thinking that putting an upper limit of 10% of > > > > > > hibufspace > > > > > > might make sense. ie. Change the above to: > > > > > > > > > > > > if (desiredvnodes >= 11000) > > > > > > nmp->nm_wcommitsize = hibufspace / (desiredvnodes / 1000); > > > > > > else > > > > > > nmp->nm_wcommitsize = hibufspace / 10; > > > > > > > > > > > > Anyone have comments or insight into this calculation? > > > > > > > > > > > > rick > > > > > > ps: jhb, I hope you don't mind. I emailed you first and then > > > > > > thought others might have some ideas, too. > > > > > > > > > > Oh no, this is fine. A broader discussion is probably > > > > > warranted. I > > > > > honestly > > > > > don't know what the goal is. I do think it is an attempt to > > > > > share with > > > > > other > > > > > file systems, but I'm not sure how desiredvnodes / 1000 is > > > > > useful for > > > > > that. > > > > > It also seems that we can end up setting this woefully low as > > > > > well. > > > > > That is, > > > > > I wonder if we need a minimum of 10% of hibufspace so that it > > > > > can > > > > > scale > > > > > between 10% and 90% of hibufspace (but I'm not sure what you > > > > > would use > > > > > to > > > > > pick the scaling factor sanely). To my mind what you really > > > > > want to do > > > > > is > > > > > something like 'hibufspace / (number of active mounts)', but > > > > > that will > > > > > not > > > > > really work correctly unless we recalculate the value on each > > > > > mount > > > > > and > > > > > unmount operation. > > > > > > > > > > -- > > > > > John Baldwin > > > > Btw, this was done by r147280 6.5years ago, so the formula > > > > doesn't seem > > > > to be causing a lot of grief. Also of some interest is the fact > > > > that > > > > wcommitsize appears to have been setable on a > > > > per-mount-point-basis until > > > > mount_nfs(8) was converted to nmount(2). { There is no nmount > > > > option to set it. } > > > > > > > > Btw, when nm_wcommitsize is exceeded, writes become synchronous, > > > > so it affects > > > > how much write behind happens. This, in turn, affects how bursty > > > > (is this a real > > > > word? hopefully you get what I mean?) the write traffic to the > > > > server is. > > > > > > > > What I'm not sure about is what happens when multiple mounts use > > > > up the entire > > > > buffer cache with write behinds. I'll try a little experiment to > > > > see if I > > > > can find that out. (If making it large isn't detrimental, then I > > > > tend to > > > > agree that the above sets nm_wcommitsize very small.) > > > > > > > > Since "desiredvnodes" will seldom be less than 1000, I'm not > > > > going to > > > > rush to a solution. > > > > > > > > Anyone who has insight into what this formula should be, please > > > > let us know. > > > > > > The commit message tries to explain it, but it's more than just a > > > one-line change. > > > > > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/nfsclient/nfs_vfsops.c#rev1.177 > > > > > > There's also an associated PR: > > > > > > http://www.freebsd.org/cgi/query-pr.cgi?pr=79208 > > > > The commit added the limit which is sensible, but it doesn't explain > > the logic > > for how the limit is computed (that is, why it uses desiredvnodes / > > 1000). > > Understood -- what I was getting at was that the individuals > responsible > for the commit (there were multiples who reviewed it) could be > contacted > and inquiries submit. :-) > I did email the original committer and have not heard back. (I didn't try the reviewer(s).) I'm going to start doing a little experimentation with this and will report back when I have something that might be of interest. I think that any fraction of hibufspace should be sufficient to avoid the deadlock. Also, since the buffer cache code doesn't use vnode locking these days, I'm not even sure if write backs are blocked by the wrire vnode op in progress. (ie. I'm not sure the deadlock it originally fixed would still happen without it.) rick > -- > | Jeremy Chadwick jdc at parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, US | > | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 13:52:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 34F6E106564A for ; Wed, 17 Aug 2011 13:52:37 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6BCFF8FC0A for ; Wed, 17 Aug 2011 13:52:36 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7HDqVXv042198 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 17 Aug 2011 16:52:31 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p7HDqUSG091356; Wed, 17 Aug 2011 16:52:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7HDqUU3091354; Wed, 17 Aug 2011 16:52:30 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 17 Aug 2011 16:52:30 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110817135230.GW17489@deviant.kiev.zoral.com.ua> References: <20110817012806.GA29555@icarus.home.lan> <1313769356.247298.1313586915280.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DTWYra+TaQJg/4dl" Content-Disposition: inline In-Reply-To: <1313769356.247298.1313586915280.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 13:52:38 -0000 --DTWYra+TaQJg/4dl Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 17, 2011 at 09:15:15AM -0400, Rick Macklem wrote: >=20 > I think that any fraction of hibufspace should be sufficient to avoid > the deadlock. Also, since the buffer cache code doesn't use vnode locking > these days, I'm not even sure if write backs are blocked by the wrire > vnode op in progress. (ie. I'm not sure the deadlock it originally fixed > would still happen without it.) bufdaemon definitely acquires vnode lock when flushing dirty buffer, this was a problem on its own. I think you refer to the nfsiod operation. There is another op that is performed without holding the vnode lock consistently from (old)nfs code, namely, truncation. It would be useful to fix this. Please see r188386. --DTWYra+TaQJg/4dl Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk5Lx54ACgkQC3+MBN1Mb4irygCgntPbEsHt+JVa1uL9BfJzv4Lz EBkAoORNKVitJTHM8xVseUyPQvSzpi1N =FgyM -----END PGP SIGNATURE----- --DTWYra+TaQJg/4dl-- From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 20:51:07 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C28B106566B for ; Wed, 17 Aug 2011 20:51:07 +0000 (UTC) (envelope-from Martin.Birgmeier@aon.at) Received: from email.aon.at (smtpout03.highway.telekom.at [195.3.96.115]) by mx1.freebsd.org (Postfix) with ESMTP id BA25A8FC1B for ; Wed, 17 Aug 2011 20:51:06 +0000 (UTC) Received: (qmail 15674 invoked from network); 17 Aug 2011 20:51:04 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.0 (2007-05-01) on WARSBL604.highway.telekom.at X-Spam-Level: Received: from 188-23-212-142.adsl.highway.telekom.at (HELO gandalf.xyzzy) ([188.23.212.142]) (envelope-sender ) by smarthub77.res.a1.net (qmail-ldap-1.03) with AES256-SHA encrypted SMTP for ; 17 Aug 2011 20:51:03 -0000 Received: from atpcdvvc.xyzzy (atpcdvvc.xyzzy [IPv6:fec0:0:0:4d42::84]) by gandalf.xyzzy (8.14.4/8.14.4) with ESMTP id p7HKp33j024521 for ; Wed, 17 Aug 2011 22:51:03 +0200 (CEST) (envelope-from Martin.Birgmeier@aon.at) Message-ID: <4E4C29B7.3010806@aon.at> Date: Wed, 17 Aug 2011 22:51:03 +0200 From: Martin Birgmeier User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:5.0) Gecko/20110708 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4E4657BD.2090803@aon.at> <20110816101229.GA2012@pm513-1.comsys.ntu-kpi.kiev.ua> In-Reply-To: <20110816101229.GA2012@pm513-1.comsys.ntu-kpi.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Does nfse support specifying multiple exports for one mount point? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 20:51:07 -0000 Thank you Andrey. Regards, Martin On 08/16/11 12:12, Andrey Simonenko wrote: > On Sat, Aug 13, 2011 at 12:53:49PM +0200, Martin Birgmeier wrote: >> See http://www.freebsd.org/cgi/query-pr.cgi?pr=147881 - can I specify >> multiple exports with nfse? >> >> I am using the patch proposed in PR 147881, even though I believe it is >> incomplete (I read that somewhere). For me, it is working fine; for >> example, I have >> >> [0]# zfs list -o name,sharenfs hal.1/backup/dumps >> NAME SHARENFS >> hal.1/backup/dumps -network 192.168.0.0 -mask 255.255.0.0;-network >> fec0:0:0:4d42::/56 >> [0]# >> >> which in /etc/zfs/exports translates to >> >> /z/backup/dumps -network 192.168.0.0 -mask 255.255.0.0 >> /z/backup/dumps -network fec0:0:0:4d42::/56 >> >> How can I specify this using nfse? >> > PR/147881 proposes a way how to specify different options for different > address specifications in one line. Eg. different -mapall options for > different hosts in one line. > > > From the nfs.exports(5) manual page: after any address specification it is > possible to use already specified option in the same line and its value will > overwrite previous option's value and it will be used for next address > specification. > > Such options are: -mapall and -maproot, -ro and -rw, -sec. It is possible > to create reverse logic options for -no_* and -mnt_export_brief options > as well. > > The ``*'' hostname represents default export and can be used in a line with > other address specification. > > Example: > > % cat exports > /fs -ro -mapall 1:2:3 1.1.1.1 -sec krb5 -maproot 2:3:4 * 2.2.2.2 -rw 3.3.3.3 > % nfse -t exports > configure: reading file exports > > Pathname /fs > Export specifications: > -rw -sec krb5 -maproot 2:3:4 -host 3.3.3.3 > -ro -sec krb5 -maproot 2:3:4 -host 2.2.2.2 > -ro -sec sys -mapall 1:2:3 -host 1.1.1.1 > -ro -sec krb5 -maproot 2:3:4 * > > The exports(5) manual page says that address specifications must be specified > after options. The nfs.exports(5) file format allows to use options after > address specifications, so they can overwrite previously specified options. > > If you applied cddl.diff patch, then you can use zfs share/unshare to change > ZFS NFS exports, and of course they will be changed atomically and changes > will be applied only for one file system in a time. As a result if one used > zfs share/unshare for ZFS file system, then exports settings from other > exports files for this file system will be flushed. > > The -alldirs options is also supported by the "zfs share" command, but > its logic does not follow logic described in nfs.exports(5). If the -alldirs > options is used then nfse will create two exports: "/fs ..." and > "/fs -subdir -alldirs ...". This is because of logic how zfs share/unshare > works: > > 1. "zfs sharenfs ..." are not incremental. When we run "zfs sharenfs" > for some file system it completely substitutes its settings. > > 2. There is no way to pass several settings for one file system at least > for mountd. > > 3. "zfs sharenfs" does not allow to export subdirectories. > > If you unsure about configuration logic for nfse, then just call "nfse -t" > and verify its output. At any time run "nfse -c show" and verify current > NFS exports settings. If you prefer to run nfse(8) in compatible mode with > mountd(8), then run it with the -C switch. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 22:18:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75C87106566C for ; Wed, 17 Aug 2011 22:18:55 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 3154C8FC17 for ; Wed, 17 Aug 2011 22:18:54 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAO49TE6DaFvO/2dsb2JhbABChEmlIIFAAQEFIwRSGxgCAg0ZAlkGLK0KkVeBLIQMgRAEkxOREQ X-IronPort-AV: E=Sophos;i="4.68,241,1312171200"; d="scan'208";a="131408281" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 17 Aug 2011 18:18:53 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B3BB3B402E; Wed, 17 Aug 2011 18:18:53 -0400 (EDT) Date: Wed, 17 Aug 2011 18:18:53 -0400 (EDT) From: Rick Macklem To: Kostik Belousov Message-ID: <1632122286.297610.1313619533702.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110817135230.GW17489@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 22:18:55 -0000 Kostik Belousov wrote: > On Wed, Aug 17, 2011 at 09:15:15AM -0400, Rick Macklem wrote: > > > > I think that any fraction of hibufspace should be sufficient to > > avoid > > the deadlock. Also, since the buffer cache code doesn't use vnode > > locking > > these days, I'm not even sure if write backs are blocked by the > > wrire > > vnode op in progress. (ie. I'm not sure the deadlock it originally > > fixed > > would still happen without it.) > > bufdaemon definitely acquires vnode lock when flushing dirty buffer, > this was a problem on its own. I think you refer to the nfsiod > operation. > Ok, so I think this means that the deadlock can still occur. I haven't yet played with the code, but I now think I might unedrstand the logic behind dividing by "(desiredvnodes / 1000)". If a single large write is happening to one NFS vnode, setting nm_wcommitsize to any fraction of hibufspace should avoid the deadlock, I think. (If I understand it correctly, the deadlock occurs when an NFS VOP_WRITE() runs out of buffer cache and no buffer cache blocks can be cleaned out because it is holding a lock on the vnode.) But, what happens if K processes concurrently do large writes on K NFS vnodes? - It seems to me that they call could deadlock when the buffer cache becomes exhausted, since they all hold locks on their respective vnodes and, therefore, none of the dirty buffers can be flushed. - If this is correct, then I think the only "safe" answer is: nm_wcommitsize = hibufspace / desiredvnodes; since it is possible that almost all vnodes could be assigned to NFS files being written concurrently with large writes. However, this would result in an absurdly low value for nm_wcommitsize. --> My best guess is the original author assumed that 0.1% of all vnodes would be a reasonable upper bound on the number being written by NFS concurrently with large writes. By the way, since nm_wcommitsize is applied to a single write, it only affects a single write(2) syscall of more than nm_wcommitsize bytes of data. (The PR refers to a writev() of 60Mbytes in size.) I honestly have no idea how many apps. do write() syscalls of megabytes in size, so I'm not sure how important it would be to make it larger than "hibufspace / (desiredvnodes / 1000)", which is about 2Mbytes on the 256Mbyte laptop I have here without any tuning tweaks? I think there might be a better way to do this than calculating a fixed "guestimate" for nm_wcommitsize and then using it for the life of the NFS mount. - The NFS VOP_WRITE() can keep track of a running total of how many bytes is being written: - add uio_resid to this running total at the beginning of the VOP_WRITE() and subtract it back out at the end of VOP_WRITE(). - if this running total exceeds something like 80% of hibufspace, then do synchronous writes (ie. use that test instead of if (nm_wcommitsize < uio->uio_resid) to make the decision. Does this sound reasonable to others? (This is actually getting interesting. Who would have guessed that a divide by zero bug report would lead to this...) rick > There is another op that is performed without holding the vnode lock > consistently from (old)nfs code, namely, truncation. It would be > useful > to fix this. Please see r188386. From owner-freebsd-fs@FreeBSD.ORG Wed Aug 17 23:56:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76F201065673 for ; Wed, 17 Aug 2011 23:56:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 30F658FC12 for ; Wed, 17 Aug 2011 23:56:58 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap0EADJUTE6DaFvO/2dsb2JhbABBhEmlIIFAAQYjVhsaAg0ZAlkGrQ+RUoEshAyBEASTE5ER X-IronPort-AV: E=Sophos;i="4.68,242,1312171200"; d="scan'208";a="131414575" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 17 Aug 2011 19:56:57 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7FC75B3F64; Wed, 17 Aug 2011 19:56:57 -0400 (EDT) Date: Wed, 17 Aug 2011 19:56:57 -0400 (EDT) From: Rick Macklem To: Kostik Belousov Message-ID: <1075004291.300557.1313625417507.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110817135230.GW17489@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2011 23:56:58 -0000 Just to correct myself... - The NFS VOP_WRITE() can keep track of a running total of how many bytes is being written: - add uio_resid to this running total at the beginning of the VOP_WRITE() and subtract it back out at the end of VOP_WRITE(). This was incorrectly stated. The value should be subtracted back out when the write rpc completes (ie. buffer has been flushed), since the running total needs to be "how many unwritten NFS bytes are in the buffer cache". At least that was what I was/am thinking... rick From owner-freebsd-fs@FreeBSD.ORG Thu Aug 18 03:00:16 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D850E106566C for ; Thu, 18 Aug 2011 03:00:16 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-5.mit.edu (DMZ-MAILSEC-SCANNER-5.MIT.EDU [18.7.68.34]) by mx1.freebsd.org (Postfix) with ESMTP id 674CA8FC0C for ; Thu, 18 Aug 2011 03:00:16 +0000 (UTC) X-AuditID: 12074422-b7ba7ae000000a14-99-4e4c803872bb Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id 55.C9.02580.8308C4E4; Wed, 17 Aug 2011 23:00:08 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id p7I30Fm8011862; Wed, 17 Aug 2011 23:00:15 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p7I30DVZ001892 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 17 Aug 2011 23:00:15 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p7I30Cva009453; Wed, 17 Aug 2011 23:00:12 -0400 (EDT) Date: Wed, 17 Aug 2011 23:00:12 -0400 (EDT) From: Benjamin Kaduk To: John In-Reply-To: <20110816040519.GA49864@FreeBSD.org> Message-ID: References: <20110816040519.GA49864@FreeBSD.org> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrLIsWRmVeSWpSXmKPExsUixG6nomvR4ONncK5Ly2LOmw9MFsce/2Sz WL/yDasDs8eMT/NZAhijuGxSUnMyy1KL9O0SuDKW3f3DWHCDu6Jz3w+mBsZXnF2MnBwSAiYS rX3/mSBsMYkL99azdTFycQgJ7GOUeLPwIpSzgVFi4rmpLBDOASaJM92n2CGcBkaJ7surWEH6 WQS0Jf7dPA42i01ARWLmm41sILaIgJTE0zmXWUBsZgFziacfloHVCAPVL305H6yGU8BQ4uvd bcxdjBwcvAIOEk/PRIKEhQQMJI6vWAhWIiqgI7F6/xSwMbwCghInZz6BGmkp8W/tL9YJjIKz kKRmIUktYGRaxSibklulm5uYmVOcmqxbnJyYl5dapGuql5tZopeaUrqJERyqLko7GH8eVDrE KMDBqMTDa/jK20+INbGsuDL3EKMkB5OSKO/eWh8/Ib6k/JTKjMTijPii0pzU4kOMEhzMSiK8 bQpAOd6UxMqq1KJ8mJQ0B4uSOC/XTgc/IYH0xJLU7NTUgtQimKwMB4eSBG9zHVCjYFFqempF WmZOCUKaiYMTZDgP0HC2KpDhxQWJucWZ6RD5U4yKUuK8nvVACQGQREZpHlwvLJW8YhQHekWY dy5IFQ8wDcF1vwIazAQ0+NYuD5DBJYkIKakGxi6TxNkzqvIW/uN6FKPW8KbM4sd+q93+mdpd fxcZaoZb5tVdlr13t9PJYPvTLoUnnh+mXNvWbMSZ0R+er5ARKcuV1nz5J8u97TlntnNMbtsv oz7DsuhekPMfy2ld22/wBz69Pl3+w2WhQ0cv6DZ2GLvN0H+d78q04uBBrukPJWK7iux+y6l4 KLEUZyQaajEXFScCAL6rYnwAAwAA Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Three LOR with latest -current X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 03:00:16 -0000 Hello John, These seem to be well-known, per http://ipv4.sources.zabbadoz.net/freebsd/lor.html On Tue, 16 Aug 2011, John wrote: > Hi folks, > > I'm seeing 3 lock order reversals with an up-to-date -current > system. Stock system, GENERIC kernel. Let me know if this isn't > enough information. Just booting the system and the dmesg. > > Thanks, > John > > > lock order reversal: > 1st 0xfffffe0289627db8 ufs (ufs) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:425 > 2nd 0xffffff9f0db49778 bufwait (bufwait) @ /usr/src.2011-08-14_10.53pm_EDT/sys/kern/vfs_bio.c:2658 > 3rd 0xfffffe00404a8098 ufs (ufs) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:546 This looks like #285. > > > lock order reversal: > 1st 0xffffff9f0db49778 bufwait (bufwait) @ /usr/src.2011-08-14_10.53pm_EDT/sys/kern/vfs_bio.c:2658 > 2nd 0xfffffe004034dcb0 snaplk (snaplk) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:818 The line numbers are a bit off, but this could be #269. > > > lock order reversal: > 1st 0xfffffe004034dcb0 snaplk (snaplk) @ /usr/src.2011-08-14_10.53pm_EDT/sys/kern/vfs_vnops.c:301 > 2nd 0xfffffe0289627db8 ufs (ufs) @ /usr/src.2011-08-14_10.53pm_EDT/sys/ufs/ffs/ffs_snapshot.c:1620 And this would be #240. Since they are so commonly reported (but no deadlocks have been attributed to them), it seems likely that they are harmless. Perhaps we should tell WITNESS to not warn about them ... -Ben Kaduk From owner-freebsd-fs@FreeBSD.ORG Thu Aug 18 12:58:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 91A21106566B for ; Thu, 18 Aug 2011 12:58:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 2D3C68FC08 for ; Thu, 18 Aug 2011 12:58:52 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p7ICwnag045817 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 18 Aug 2011 15:58:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p7ICwnfA011247; Thu, 18 Aug 2011 15:58:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p7ICwnFt011246; Thu, 18 Aug 2011 15:58:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 18 Aug 2011 15:58:49 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110818125849.GE17489@deviant.kiev.zoral.com.ua> References: <20110817135230.GW17489@deviant.kiev.zoral.com.ua> <1632122286.297610.1313619533702.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ADZ8S6Yea/b683e6" Content-Disposition: inline In-Reply-To: <1632122286.297610.1313619533702.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, onwahe@gmail.com Subject: Re: NFS calculation of max commit size X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 12:58:53 -0000 --ADZ8S6Yea/b683e6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 17, 2011 at 06:18:53PM -0400, Rick Macklem wrote: > Kostik Belousov wrote: > > On Wed, Aug 17, 2011 at 09:15:15AM -0400, Rick Macklem wrote: > > > > > > I think that any fraction of hibufspace should be sufficient to > > > avoid > > > the deadlock. Also, since the buffer cache code doesn't use vnode > > > locking > > > these days, I'm not even sure if write backs are blocked by the > > > wrire > > > vnode op in progress. (ie. I'm not sure the deadlock it originally > > > fixed > > > would still happen without it.) > >=20 > > bufdaemon definitely acquires vnode lock when flushing dirty buffer, > > this was a problem on its own. I think you refer to the nfsiod > > operation. > >=20 > Ok, so I think this means that the deadlock can still occur. > I haven't yet played with the code, but I now think I might unedrstand > the logic behind dividing by "(desiredvnodes / 1000)". >=20 > If a single large write is happening to one NFS vnode, setting > nm_wcommitsize to any fraction of hibufspace should avoid the deadlock, > I think. (If I understand it correctly, the deadlock occurs when an > NFS VOP_WRITE() runs out of buffer cache and no buffer cache blocks > can be cleaned out because it is holding a lock on the vnode.) No, if nfs write vop tries to allocate a new buffer, then vfs_bio.c will note that attempt is made to allocate with the vnode lock held, and will do a pass of the dirty cache flushing buffers owned by the vnode. See a call to buf_do_flush() from getnewbuf() and buf_do_flush() code itself. This is what I referred to as 'a problem on its own'. The change helped to fix a bufdaemon deadlock you described, that indeed happen relatively often. >=20 > But, what happens if K processes concurrently do large writes on K > NFS vnodes? > - It seems to me that they call could deadlock when the buffer cache > becomes exhausted, since they all hold locks on their respective > vnodes and, therefore, none of the dirty buffers can be flushed. > - If this is correct, then I think the only "safe" answer is: > nm_wcommitsize =3D hibufspace / desiredvnodes; > since it is possible that almost all vnodes could be assigned to > NFS files being written concurrently with large writes. > However, this would result in an absurdly low value for nm_wcommitsize. >=20 > --> My best guess is the original author assumed that 0.1% of all vnodes > would be a reasonable upper bound on the number being written by NFS > concurrently with large writes. >=20 > By the way, since nm_wcommitsize is applied to a single write, it only > affects a single write(2) syscall of more than nm_wcommitsize bytes of > data. (The PR refers to a writev() of 60Mbytes in size.) > I honestly have no idea how many apps. do write() syscalls of megabytes > in size, so I'm not sure how important it would be to make it larger > than "hibufspace / (desiredvnodes / 1000)", which is about 2Mbytes on > the 256Mbyte laptop I have here without any tuning tweaks? >=20 > I think there might be a better way to do this than calculating a > fixed "guestimate" for nm_wcommitsize and then using it for the life > of the NFS mount. > - The NFS VOP_WRITE() can keep track of a running total of how many > bytes is being written: > - add uio_resid to this running total at the beginning of the VOP_WRITE= () > and subtract it back out at the end of VOP_WRITE(). > - if this running total exceeds something like 80% of hibufspace, then > do synchronous writes (ie. use that test instead of > if (nm_wcommitsize < uio->uio_resid) to make the decision. >=20 > Does this sound reasonable to others? > (This is actually getting interesting. Who would have guessed that a > divide by zero bug report would lead to this...) >=20 > rick > > There is another op that is performed without holding the vnode lock > > consistently from (old)nfs code, namely, truncation. It would be > > useful > > to fix this. Please see r188386. --ADZ8S6Yea/b683e6 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk5NDIgACgkQC3+MBN1Mb4goIwCgsDM23cix0FchRJmbDXilSyZY JEkAoJ6o/edVJVLaeF50bY2E88rTPoWR =MwS4 -----END PGP SIGNATURE----- --ADZ8S6Yea/b683e6-- From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 17:40:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 579B71065673; Fri, 19 Aug 2011 17:40:33 +0000 (UTC) Date: Fri, 19 Aug 2011 17:40:33 +0000 From: Alexander Best To: freebsd-fs@freebsd.org Message-ID: <20110819174033.GA68015@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: probably embarrising SUJ question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 17:40:33 -0000 hi there, i recently saw somebody using mount -o async in combination with gjournal. i just wanted to ask, whether async can also be used with SUJ? or will this put me in a dangerous situation, where my fs will get hosed after a crash? cheers. alex From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 18:19:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57B2E1065673; Fri, 19 Aug 2011 18:19:48 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235]) by mx1.freebsd.org (Postfix) with ESMTP id 380258FC1D; Fri, 19 Aug 2011 18:19:48 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p7JHqHEt030978; Fri, 19 Aug 2011 10:52:17 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201108191752.p7JHqHEt030978@chez.mckusick.com> To: Alexander Best In-reply-to: <20110819174033.GA68015@freebsd.org> Date: Fri, 19 Aug 2011 10:52:17 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@freebsd.org Subject: Re: probably embarrassing SUJ question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 18:19:48 -0000 > Date: Fri, 19 Aug 2011 17:40:33 +0000 > From: Alexander Best > To: freebsd-fs@freebsd.org > Subject: probably embarrassing SUJ question > > Hi there, > > I recently saw somebody using mount -o async in combination with > gjournal. I just wanted to ask, whether async can also be used with > SUJ? or will this put me in a dangerous situation, where my fs will > get hosed after a crash? > > cheers. > alex The async flag is incompatible with soft updates or journaled soft updates. But not to fear, we added a seatbelt in ffs_mount: /* * Soft updates is incompatible with "async", * so if we are doing softupdates stop the user * from setting the async flag in an update. * Softdep_mount() clears it in an initial mount * or ro->rw remount. */ if (MOUNTEDSOFTDEP(mp)) { MNT_ILOCK(mp); mp->mnt_flag &= ~MNT_ASYNC; MNT_IUNLOCK(mp); } So, nothing bad will happen. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 18:54:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 9C0A9106566C; Fri, 19 Aug 2011 18:54:26 +0000 (UTC) Date: Fri, 19 Aug 2011 18:54:26 +0000 From: Alexander Best To: Kirk McKusick Message-ID: <20110819185426.GA77630@freebsd.org> References: <20110819174033.GA68015@freebsd.org> <201108191752.p7JHqHEt030978@chez.mckusick.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201108191752.p7JHqHEt030978@chez.mckusick.com> Cc: freebsd-fs@freebsd.org Subject: Re: probably embarrassing SUJ question X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 18:54:26 -0000 On Fri Aug 19 11, Kirk McKusick wrote: > > Date: Fri, 19 Aug 2011 17:40:33 +0000 > > From: Alexander Best > > To: freebsd-fs@freebsd.org > > Subject: probably embarrassing SUJ question > > > > Hi there, > > > > I recently saw somebody using mount -o async in combination with > > gjournal. I just wanted to ask, whether async can also be used with > > SUJ? or will this put me in a dangerous situation, where my fs will > > get hosed after a crash? > > > > cheers. > > alex > > The async flag is incompatible with soft updates or journaled > soft updates. But not to fear, we added a seatbelt in ffs_mount: > > /* > * Soft updates is incompatible with "async", > * so if we are doing softupdates stop the user > * from setting the async flag in an update. > * Softdep_mount() clears it in an initial mount > * or ro->rw remount. > */ > if (MOUNTEDSOFTDEP(mp)) { > MNT_ILOCK(mp); > mp->mnt_flag &= ~MNT_ASYNC; > MNT_IUNLOCK(mp); > } > > So, nothing bad will happen. ah..thanks a lot. good thing you provide such kind of seatbelts. :) cheers. alex > > Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 19:03:24 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 527311065747; Fri, 19 Aug 2011 19:03:24 +0000 (UTC) Date: Fri, 19 Aug 2011 19:03:24 +0000 From: Alexander Best To: freebsd-fs@freebsd.org Message-ID: <20110819190324.GA78837@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 19:03:24 -0000 hi there, can somebody confirm this issue? is it already known? otaku% ll|grep HELL drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO -rw-r--r-- 1 arundel arundel 0 19 Aug 20:13 HELLO2 otaku% touch HELLO* otaku% ll|grep HELL drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO -rw-r--r-- 1 arundel arundel 0 19 Aug 20:55 HELLO2 doing the same on a UFS2 partition works as expected. cheers. alex From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 19:40:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C763106564A for ; Fri, 19 Aug 2011 19:40:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id DAFEE8FC1E for ; Fri, 19 Aug 2011 19:40:32 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAG+7Tk6DaFvO/2dsb2JhbABBhEukOYFAAQEBAQMBAQEgKyALGw4KAgINGQIpAQkYAQ0GCAcEARwEh1SnNZE7gSyEDIEQBJEFgg6REQ X-IronPort-AV: E=Sophos;i="4.68,252,1312171200"; d="scan'208";a="131640050" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Aug 2011 15:40:31 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B962BB3F2B; Fri, 19 Aug 2011 15:40:31 -0400 (EDT) Date: Fri, 19 Aug 2011 15:40:31 -0400 (EDT) From: Rick Macklem To: Alexander Best Message-ID: <1092971110.92110.1313782831745.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110819190324.GA78837@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 19:40:33 -0000 Alexander Best wrote: > hi there, > > can somebody confirm this issue? is it already known? > > otaku% ll|grep HELL > drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO > -rw-r--r-- 1 arundel arundel 0 19 Aug 20:13 HELLO2 > otaku% touch HELLO* > otaku% ll|grep HELL > drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO > -rw-r--r-- 1 arundel arundel 0 19 Aug 20:55 HELLO2 > Yes, FAT file systems do not maintain a directory modify time. (The original FAT12,16 structure didn't even have a modify time for the root dir.) Just like Windows. This causes issues when a FAT fs is exported via NFS and someone was going to experiment with an "in memory only" modify time for dirs, to minimize caching issues, but I haven't heard back from them lately. Apparently Mac OS X chooses to update the modify time that exists on FAT32 file systems, but that isn't Windows compatible. rick > doing the same on a UFS2 partition works as expected. > > > cheers. > alex > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 22:11:07 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66848106566C for ; Fri, 19 Aug 2011 22:11:07 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 1F9518FC1D for ; Fri, 19 Aug 2011 22:11:06 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QuXHL-00036w-US for freebsd-fs@freebsd.org; Sat, 20 Aug 2011 00:11:03 +0200 Received: from 208.88.188.90.adsl.tomsknet.ru ([90.188.88.208]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 20 Aug 2011 00:11:03 +0200 Received: from vadim_nuclight by 208.88.188.90.adsl.tomsknet.ru with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 20 Aug 2011 00:11:03 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Vadim Goncharov Date: Fri, 19 Aug 2011 22:10:49 +0000 (UTC) Organization: Nuclear Lightning @ Tomsk, TPU AVTF Hostel Lines: 26 Message-ID: References: <20110819190324.GA78837@freebsd.org> <1092971110.92110.1313782831745.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 208.88.188.90.adsl.tomsknet.ru X-Comment-To: Rick Macklem User-Agent: slrn/0.9.9p1 (FreeBSD) Subject: Re: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: vadim_nuclight@mail.ru List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 22:11:07 -0000 Hi Rick Macklem! On Fri, 19 Aug 2011 15:40:31 -0400 (EDT); Rick Macklem wrote about 'Re: touch(1) not working on directories in an msdosfs(5) envirement': > Yes, FAT file systems do not maintain a directory modify > time. (The original FAT12,16 structure didn't even have a > modify time for the root dir.) > Just like Windows. > This causes issues when a FAT fs is exported via NFS and > someone was going to experiment with an "in memory only" > modify time for dirs, to minimize caching issues, but I > haven't heard back from them lately. > Apparently Mac OS X chooses to update the modify time that > exists on FAT32 file systems, but that isn't Windows compatible. What? I've just now created a test directory and changed it's modify time in Far Manager on Windows 2000, in a FAT32 partition. In fact it allows to change all three directory times, creation and access, too. So, I conclude, the FAT supports it. -- WBR, Vadim Goncharov. ICQ#166852181 mailto:vadim_nuclight@mail.ru [Anti-Greenpeace][Sober FreeBSD zealot][http://nuclight.livejournal.com] From owner-freebsd-fs@FreeBSD.ORG Fri Aug 19 22:58:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 647691065670 for ; Fri, 19 Aug 2011 22:58:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 24A858FC12 for ; Fri, 19 Aug 2011 22:58:56 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAGbqTk6DaFvO/2dsb2JhbABBhEukOoFAAQEBAQMBAQEgBCcgCxsYAgINFgMCKQEJFQMBDQYIBwQBHASHVKc2kSmBLIQMgRAEkQWCDpER X-IronPort-AV: E=Sophos;i="4.68,253,1312171200"; d="scan'208";a="131656528" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Aug 2011 18:58:55 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5BC15B3F9F; Fri, 19 Aug 2011 18:58:55 -0400 (EDT) Date: Fri, 19 Aug 2011 18:58:55 -0400 (EDT) From: Rick Macklem To: vadim nuclight Message-ID: <1303085986.99226.1313794735324.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2011 22:58:56 -0000 Vadim Goncharov wrote: > Hi Rick Macklem! > > On Fri, 19 Aug 2011 15:40:31 -0400 (EDT); Rick Macklem wrote about > 'Re: touch(1) not working on directories in an msdosfs(5) envirement': > > > Yes, FAT file systems do not maintain a directory modify > > time. (The original FAT12,16 structure didn't even have a > > modify time for the root dir.) > > > Just like Windows. > > > This causes issues when a FAT fs is exported via NFS and > > someone was going to experiment with an "in memory only" > > modify time for dirs, to minimize caching issues, but I > > haven't heard back from them lately. > > > Apparently Mac OS X chooses to update the modify time that > > exists on FAT32 file systems, but that isn't Windows compatible. > > What? I've just now created a test directory and changed it's modify > time > in Far Manager on Windows 2000, in a FAT32 partition. In fact it > allows to > change all three directory times, creation and access, too. So, I > conclude, > the FAT supports it. > Well, FAT32 (not the root dir of FAT12 or FAT16) does have a modify time stored on disk for the directory entry for a directory. The case I was thinking of (because that was what affected NFS client caching) was the case where an entry is added to a directory. I just checked that and it does not change the directory's modify time when an entry is added to a directory (at least for Windows7 personal...). I'm not enough of a Windows guy to even know what "Far Manager" is, but I'm not surprised that there is a tool that can change it. msdosfs_setattr() in sys/fs/msdosfs/msdosfs_vnops.c definitely only does it for non-directories: if (vp->v_type != VDIR) { if ((pmp->pm_flags & MSDOSFSMNT_NOWIN95) == 0 && vap->va_atime.tv_sec != VNOVAL) { dep->de_flag &= ~DE_ACCESS; timespec2fattime(&vap->va_atime, 0, &dep->de_ADate, NULL, NULL); } if (vap->va_mtime.tv_sec != VNOVAL) { dep->de_flag &= ~DE_UPDATE; timespec2fattime(&vap->va_mtime, 0, &dep->de_MDate, &dep->de_MTime, NULL); } dep->de_Attributes |= ATTR_ARCHIVE; dep->de_flag |= DE_MODIFIED; } I'm not the author of the above, but I had assumed that it was because Windows doesn't normally update it. Obviously, the above code could easily be changed (although I haven't tested that), if that is now considered correct behaviour. (It might have been because the msdosfs is meant to work for all FAT variants.) rick > -- > WBR, Vadim Goncharov. ICQ#166852181 mailto:vadim_nuclight@mail.ru > [Anti-Greenpeace][Sober FreeBSD > zealot][http://nuclight.livejournal.com] > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat Aug 20 04:11:05 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 621E7106564A; Sat, 20 Aug 2011 04:11:05 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3A5AE8FC13; Sat, 20 Aug 2011 04:11:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7K4B5Vd041462; Sat, 20 Aug 2011 04:11:05 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7K4B5Ae041453; Sat, 20 Aug 2011 04:11:05 GMT (envelope-from linimon) Date: Sat, 20 Aug 2011 04:11:05 GMT Message-Id: <201108200411.p7K4B5Ae041453@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-amd64@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/159930: [ufs] [panic] kernel core X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 04:11:05 -0000 Old Synopsis: kernel core New Synopsis: [ufs] [panic] kernel core Responsible-Changed-From-To: freebsd-amd64->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Aug 20 04:10:35 UTC 2011 Responsible-Changed-Why: attempt to reclassify. http://www.freebsd.org/cgi/query-pr.cgi?pr=159930 From owner-freebsd-fs@FreeBSD.ORG Sat Aug 20 07:00:24 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2895D106566B for ; Sat, 20 Aug 2011 07:00:24 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 182208FC08 for ; Sat, 20 Aug 2011 07:00:24 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7K70Nqb000173 for ; Sat, 20 Aug 2011 07:00:23 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7K70NDo000172; Sat, 20 Aug 2011 07:00:23 GMT (envelope-from gnats) Date: Sat, 20 Aug 2011 07:00:23 GMT Message-Id: <201108200700.p7K70NDo000172@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Sergey Kandaurov Cc: Subject: Re: kern/159930: [ufs] [panic] kernel core X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Sergey Kandaurov List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 07:00:24 -0000 The following reply was made to PR kern/159930; it has been noted by GNATS. From: Sergey Kandaurov To: bug-followup@FreeBSD.org, nospam@ofloo.net Cc: Subject: Re: kern/159930: [ufs] [panic] kernel core Date: Sat, 20 Aug 2011 10:58:29 +0400 Do you use "options QUOTA" ? How often do you experience this crash? Can you show the exact way to reproduce it? Can you check if the following patch helps you? Thanks. --- sys/ufs/ffs/ffs_inode.c 2010-06-14 06:09:06.000000000 +0400 +++ sys/ufs/ffs/ffs_inode.c 2010-12-09 15:25:28.000000000 +0300 @@ -215,7 +215,7 @@ osize = ip->i_din2->di_extsize; ip->i_din2->di_blocks -= extblocks; #ifdef QUOTA - (void) chkdq(ip, -extblocks, NOCRED, 0); + (void) chkdq(ip, -extblocks, NOCRED, FORCE); #endif vinvalbuf(vp, V_ALT, 0, 0); ffs_pages_remove(vp, @@ -290,7 +290,7 @@ UFS_UNLOCK(ump); } else { #ifdef QUOTA - (void) chkdq(ip, -datablocks, NOCRED, 0); + (void) chkdq(ip, -datablocks, NOCRED, FORCE); #endif softdep_setup_freeblocks(ip, length, needextclean ? IO_EXT | IO_NORMAL : IO_NORMAL); @@ -526,7 +526,7 @@ DIP_SET(ip, i_blocks, 0); ip->i_flag |= IN_CHANGE; #ifdef QUOTA - (void) chkdq(ip, -blocksreleased, NOCRED, 0); + (void) chkdq(ip, -blocksreleased, NOCRED, FORCE); #endif return (allerror); } -- wbr, pluknet From owner-freebsd-fs@FreeBSD.ORG Sat Aug 20 07:14:03 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5F6F1065674 for ; Sat, 20 Aug 2011 07:14:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 81AF98FC13 for ; Sat, 20 Aug 2011 07:14:01 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p7K7DxtU022662 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Aug 2011 17:13:59 +1000 Date: Sat, 20 Aug 2011 17:13:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <1303085986.99226.1313794735324.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110820164559.Q872@besplex.bde.org> References: <1303085986.99226.1313794735324.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: vadim nuclight , freebsd-fs@freebsd.org Subject: Re: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 07:14:03 -0000 On Fri, 19 Aug 2011, Rick Macklem wrote: > Vadim Goncharov wrote: >> On Fri, 19 Aug 2011 15:40:31 -0400 (EDT); Rick Macklem wrote about >> ... >>> Apparently Mac OS X chooses to update the modify time that >>> exists on FAT32 file systems, but that isn't Windows compatible. >> >> What? I've just now created a test directory and changed it's modify >> time >> in Far Manager on Windows 2000, in a FAT32 partition. In fact it >> allows to >> change all three directory times, creation and access, too. So, I >> conclude, >> the FAT supports it. >> > Well, FAT32 (not the root dir of FAT12 or FAT16) does have a modify > time stored on disk for the directory entry for a directory. In a previous reply, I might have misremembered the limitations of old FAT on directories. Now ISTR something before (?) FAT12 which only had the root directory. > The case I was thinking of (because that was what affected NFS client > caching) was the case where an entry is added to a directory. I just > checked that and it does not change the directory's modify time when > an entry is added to a directory (at least for Windows7 personal...). This is the intentional part of msdosfs's compatibility. > I'm not enough of a Windows guy to even know what "Far Manager" is, > but I'm not surprised that there is a tool that can change it. Me either, but I know cygwin can do most things (but it is so slow that it is faster to reboot to run FreeBSD utilities to do anything involving more than a few hundred files -- even a simple find -name takes 10-100 times longer in cygwin). > msdosfs_setattr() in sys/fs/msdosfs/msdosfs_vnops.c definitely only > does it for non-directories: > if (vp->v_type != VDIR) { > if ((pmp->pm_flags & MSDOSFSMNT_NOWIN95) == 0 && > vap->va_atime.tv_sec != VNOVAL) { > dep->de_flag &= ~DE_ACCESS; > timespec2fattime(&vap->va_atime, 0, > &dep->de_ADate, NULL, NULL); > } > if (vap->va_mtime.tv_sec != VNOVAL) { > dep->de_flag &= ~DE_UPDATE; > timespec2fattime(&vap->va_mtime, 0, > &dep->de_MDate, &dep->de_MTime, NULL); > } > dep->de_Attributes |= ATTR_ARCHIVE; > dep->de_flag |= DE_MODIFIED; > } Yes, the special case for directories is just a bug (except for ATTR_ARCHIVE). > I'm not the author of the above, but I had assumed that it was > because Windows doesn't normally update it. Obviously, the above > code could easily be changed (although I haven't tested that), if > that is now considered correct behaviour. (It might have been > because the msdosfs is meant to work for all FAT variants.) But this is msdosfs_setattr(), whose purpose is to update it. The non-update for directory changes seems to be only here in detrunc(): % /* % * Write out the updated directory entry. Even if the update fails % * we free the trailing clusters. % */ % dep->de_FileSize = length; % if (!isadir) % dep->de_flag |= DE_UPDATE|DE_MODIFIED; % allerror = vtruncbuf(DETOV(dep), cred, td, length, pmp->pm_bpcluster); I don't quite understand how extensions or changes in place either set or avoid setting DE_MODIFIED -- grep didn't seem to show any relevant settings -- maybe all cases go through detrunc(). Bruce From owner-freebsd-fs@FreeBSD.ORG Sat Aug 20 07:50:11 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E646106566B for ; Sat, 20 Aug 2011 07:50:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 4E7048FC12 for ; Sat, 20 Aug 2011 07:50:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7K7oBJL069225 for ; Sat, 20 Aug 2011 07:50:11 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7K7oBKE069224; Sat, 20 Aug 2011 07:50:11 GMT (envelope-from gnats) Date: Sat, 20 Aug 2011 07:50:11 GMT Message-Id: <201108200750.p7K7oBKE069224@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/157728: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 07:50:11 -0000 The following reply was made to PR kern/157728; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/157728: commit references a PR Date: Sat, 20 Aug 2011 07:43:25 +0000 (UTC) Author: mm Date: Sat Aug 20 07:43:10 2011 New Revision: 225022 URL: http://svn.freebsd.org/changeset/base/225022 Log: MFC r224814, r224855: MFC r224814 [1]: Fix race between dmu_objset_prefetch() invoked from zfs_ioc_dataset_list_next() and dsl_dir_destroy_check() indirectly invoked from dmu_recv_existing_end() via dsl_dataset_destroy() by not prefetching temporary clones, as these count as always inconsistent. In addition, do not prefetch hidden datasets at all as we are not going to process these later. Filed as Illumos Bug #1346 MFC r224855: zfs_ioctl.c: improve code readability in zfs_ioc_dataset_list_next() zvol.c: fix calling of dmu_objset_prefetch() in zvol_create_minors() by passing full instead of relative dataset name and prefetching all visible datasets to be processed later instead of just the pool name PR: kern/157728 [1] Tested by: Borja Marcos [1], mm Reviewed by: pjd Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c Sat Aug 20 06:08:31 2011 (r225021) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c Sat Aug 20 07:43:10 2011 (r225022) @@ -1963,8 +1963,10 @@ top: uint64_t cookie = 0; int len = sizeof (zc->zc_name) - (p - zc->zc_name); - while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0) - (void) dmu_objset_prefetch(zc->zc_name, NULL); + while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0) { + if (!dataset_name_hidden(zc->zc_name)) + (void) dmu_objset_prefetch(zc->zc_name, NULL); + } } do { Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Sat Aug 20 06:08:31 2011 (r225021) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c Sat Aug 20 07:43:10 2011 (r225022) @@ -2200,11 +2200,11 @@ zvol_create_minors(const char *name) p = osname + strlen(osname); len = MAXPATHLEN - (p - osname); - if (strchr(name, '/') == NULL) { - /* Prefetch only for pool name. */ - cookie = 0; - while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0) - (void) dmu_objset_prefetch(p, NULL); + /* Prefetch the datasets. */ + cookie = 0; + while (dmu_dir_list_next(os, len, p, NULL, &cookie) == 0) { + if (!dataset_name_hidden(osname)) + (void) dmu_objset_prefetch(osname, NULL); } cookie = 0; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat Aug 20 08:59:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AED9D106564A for ; Sat, 20 Aug 2011 08:59:08 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 3054B8FC0A for ; Sat, 20 Aug 2011 08:59:07 +0000 (UTC) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p7K6j4Y0016467 for ; Sat, 20 Aug 2011 16:45:05 +1000 Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p7K6ix48003992 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 20 Aug 2011 16:45:00 +1000 Date: Sat, 20 Aug 2011 16:44:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <1092971110.92110.1313782831745.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110820145112.Y872@besplex.bde.org> References: <1092971110.92110.1313782831745.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Alexander Best Subject: Re: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 08:59:08 -0000 On Fri, 19 Aug 2011, Rick Macklem wrote: > Alexander Best wrote: >> can somebody confirm this issue? is it already known? >> >> otaku% ll|grep HELL >> drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO >> -rw-r--r-- 1 arundel arundel 0 19 Aug 20:13 HELLO2 >> otaku% touch HELLO* >> otaku% ll|grep HELL >> drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO >> -rw-r--r-- 1 arundel arundel 0 19 Aug 20:55 HELLO2 This is fixed (hacked around to keep the diffs small) in my version: % Index: msdosfs_vnops.c % =================================================================== % RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v % retrieving revision 1.147 % diff -u -2 -r1.147 msdosfs_vnops.c % --- msdosfs_vnops.c 4 Feb 2004 21:52:53 -0000 1.147 % +++ msdosfs_vnops.c 12 Nov 2007 21:47:48 -0000 % @@ -457,5 +457,7 @@ % (error = VOP_ACCESS(ap->a_vp, VWRITE, cred, ap->a_td)))) % return (error); % +#if 0 % if (vp->v_type != VDIR) { % +#endif % if ((pmp->pm_flags & MSDOSFSMNT_NOWIN95) == 0 && % vap->va_atime.tv_sec != VNOVAL) { The main part of the fix is to just remove the special case for directories which just breaks utimes() on directories. Even DOS and Windows never had this brokenness. What DOS and Windows do specially for directories is not update their modification time when their contents is changed. FreeBSD is compatible with this, and the above special case is apparently the result of trying too hard to be compatible with DOS and Windows. % @@ -463,4 +465,7 @@ % unix2dostime(&vap->va_atime, &dep->de_ADate, % NULL, NULL); % + if (vp->v_type != VDIR) % + dep->de_Attributes |= ATTR_ARCHIVE; % + dep->de_flag |= DE_MODIFIED; % } % if (vap->va_mtime.tv_sec != VNOVAL) { Now that setting times on directories is unbroken, we have to be more careful with the archive bit. In DOS and Windows, the archive bit is meaningless for FAT* directories and is not set by any syscall that I know of (unlike for ffs IIRC). The above avoids setting it for directories. Not setting DE_MODIFIED is an unrelated micro-optimization (try harder not to set it when we didn't change anything). % @@ -468,8 +473,11 @@ % unix2dostime(&vap->va_mtime, &dep->de_MDate, % &dep->de_MTime, NULL); % + if (vp->v_type != VDIR) % + dep->de_Attributes |= ATTR_ARCHIVE; % + dep->de_flag |= DE_MODIFIED; Similarly for the mtime. % } % - dep->de_Attributes |= ATTR_ARCHIVE; % - dep->de_flag |= DE_MODIFIED; This was moved early. % +#if 0 % } % +#endif Finish hacking way the special case for directories. % } % /* % @@ -494,5 +502,5 @@ % } % } % - return (deupdat(dep, 1)); % + return (deupdat(dep, 0)); Remove an unrelated pessimization (a synchronous update where even an asynchronous update is more than what is needed). Even ffs in Net/2 didn't have the full pessimization here -- it pessimized SETATTR on times but not on the more important ownership and permission attributes. ffs still had the pessimization for times in 4.4BSD-Lite2, but FreeBSD fixed it in 1998 (ufs_vnops.c 1.79; the fix was buried in a mega-commit with a content-free log message :-(), and I fixed it in my version of ffs long before that. msdosfs_setattr() is a little different from ufs_setattr(); in particular, it does the (previous synchronous) update for all successful calls while ufs_setattr() only ever did it for times, so the pessimization has a wider scope in msdosfs. % } The above is only the least serious of the bugs in msdosfs_setattr() :-(. With the above fix, plain touch works as well as possible -- it cannot work perfectly since setting of atimes is not always supported. But touch -r and more importantly, cp -p only work as well as possible for root, since they use utimes() without the null timeptr arg that allows plain touch to work. A non-null timeptr arg ends up normally requiring root permissions for msdosfs where it normally doesn't require extra permissions for ffs, because ownership requirements for the non-null case cannot be satisfied by file systems that don't really support ownerships. We fudge the ownerships and use weak checks on them in most places, but for utimes() we use strict checks that almost always fail: from my old version: % if (vap->va_flags != VNOVAL) { % if (vp->v_mount->mnt_flag & MNT_RDONLY) % return (EROFS); % if (cred->cr_uid != pmp->pm_uid && % (error = suser_cred(cred, PRISON_ROOT))) % return (error); The implementation of this check has changed significantly in -current, but its semantics and result havven't. The file must be owned by someone (pmp->pm_uid), and no one else except root has permission for utimes() with a non-null timeptr. This works right in ffs because the file can normally be owned by its rightful owner, but in msdosfs the owner is faked and there can be only one owner for the whole file systems. I use owner root and group msdosfs for all msdosfs file systems. The group permissions allow non-root users in group msdosfs to do almost everything except this unimportant utimes() operation to almost all msdosfs files. % /* % * We are very inconsistent about handling unsupported % * attributes. We ignored the access time and the % * read and execute bits. We were strict for the other % * attributes. % * % * Here we are strict, stricter than ufs in not allowing % * users to attempt to set SF_SETTABLE bits or anyone to % * set unsupported bits. However, we ignore attempts to % * set ATTR_ARCHIVE for directories `cp -pr' from a more % * sensible filesystem attempts it a lot. % */ This comment is partly about the problem as it affects non-time attributes. There is a problem with cp -p for almost all attributes, since msdosfs doesn't really support them so cp -p from another file system that supports more of them of them must either fail, or the failures must be silently or unsilently ignored. cp -p unsilently ignores EPERM errors for *chown(), but doesn't ignore any (?) other error (exit status != 0). This gives the rather silly handling for typical errors for cp -p to msdosfs: chown() usually fails at the syscall level but succeeds with a warning at the shell level, but the less important utimes() usually fails at both levels. There is a related problem with file time granularity. It is the usual case that the file system has a different granularity than the system and other file systems. When the target has more granularity than the source, it is usually impossible to duplicate the times. Having utimes() fail when the times cannot be duplicated would be too strict in general, but sometimes you would like to be strict. POSIX has only recently started addressing this problem. (Old?) FAT with its 2-second granularity has always been unable to represent the default 1-second granularity, and has always handled this by silently truncating to the granularity that it supports. My test directory for testing this mail shows another granularity problem: (the file system is FAT32 with Win95 long names): after mkdir of it, a stat utility on it gives: % file=z % ... % atime=Sat Aug 20 00:00:00 2011 (1313762400.0) % ctime=Sat Aug 20 16:14:29 2011 (1313820869.740000000) % mtime=Sat Aug 20 16:14:28 2011 (1313820868.0) This has the expected 2-second granularity for the mtime, but the other times are strange: - the atime is far in the past, and according to other tests has a granularity of at least 200 seconds - the ctime has a granularity of 100 msec. This differs significantly from the mtime's granularity, so the ctime is up to 1.99 seconds in advance of the mtime. This is probably a local bug -- I probably don't have the fix for confusion between the ctime and the creation time (birthtime). msdosfs only has a creation time so the ctime must be faked and should usually be the same as the mtime. But how does the creation time have more precision? In other tests, creat() of a file sets the mtime and ctime reasonably, but the atime remains with a fixed value far in the past. touch advances the mtime correctly, but doesn't update the ctime. This is consistent with displayed ctime actually being the creation time. > Yes, FAT file systems do not maintain a directory modify > time. Er, yes they do... > (The original FAT12,16 structure didn't even have a > modify time for the root dir.) ... except for the root directory, they don't, and this doesn't depend on the the version -- there is no directory entry for the root file system, so the modification time can't be stored in the usual place for root directories, only. > Just like Windows. I normally use cygwin for managing file times on Windows, and touch and cp -p work reasonably well with it. In particular, tuch updates directory times. 15-25 years ago, I used DOS utitities to manage file times and don't remember any problems with touch. Even DOS 2.1 (?) has a syscall like utimes(). > This causes issues when a FAT fs is exported via NFS and > someone was going to experiment with an "in memory only" > modify time for dirs, to minimize caching issues, but I > haven't heard back from them lately. "Memory only" times must never escape to userland. Linux has (had?) file times in vfs, which makes many things easy, but old versions of Linux did let them escape to userland. I ran fussy POSIX conformance tests for times. The tests would succeed for non-POSIX file systems, but only due to the times being in memory, and then unly while the files were cached in memory. > Apparently Mac OS X chooses to update the modify time that > exists on FAT32 file systems, but that isn't Windows compatible. Yes, it's a bug in Mac OS to be incompatile. However, I sometimes wish for a mount option to control this. Similarly for weakening of checking for attributes that cannot be set. Also, for file times, there is another annoying problem which might be best handled by a mount option: msdosfs file time change twice a year with daylight saving. I sometimes back up msdosfs files to ffs where their times don't change like this, and would like an easy way to stop the changes. Moving across timezones might cause even more frequent changes, but this doesn't affect me. Bruce From owner-freebsd-fs@FreeBSD.ORG Sat Aug 20 09:42:54 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 1233) id 3600E1065674; Sat, 20 Aug 2011 09:42:54 +0000 (UTC) Date: Sat, 20 Aug 2011 09:42:54 +0000 From: Alexander Best To: Bruce Evans Message-ID: <20110820094254.GA66130@freebsd.org> References: <1092971110.92110.1313782831745.JavaMail.root@erie.cs.uoguelph.ca> <20110820145112.Y872@besplex.bde.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110820145112.Y872@besplex.bde.org> Cc: freebsd-fs@freebsd.org Subject: Re: touch(1) not working on directories in an msdosfs(5) envirement X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 09:42:54 -0000 On Sat Aug 20 11, Bruce Evans wrote: > On Fri, 19 Aug 2011, Rick Macklem wrote: > > >Alexander Best wrote: > >>can somebody confirm this issue? is it already known? > >> > >>otaku% ll|grep HELL > >>drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO > >>-rw-r--r-- 1 arundel arundel 0 19 Aug 20:13 HELLO2 > >>otaku% touch HELLO* > >>otaku% ll|grep HELL > >>drwxr-xr-x 1 arundel arundel 16384 19 Aug 19:57 HELLO > >>-rw-r--r-- 1 arundel arundel 0 19 Aug 20:55 HELLO2 > > This is fixed (hacked around to keep the diffs small) in my version: WOW! such a lot of detailed info on the subject and useful lines of code. you should really get back to being an active committer, so we all get the benefit of your code. ;) > > % Index: msdosfs_vnops.c > % =================================================================== > % RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_vnops.c,v > % retrieving revision 1.147 > % diff -u -2 -r1.147 msdosfs_vnops.c > % --- msdosfs_vnops.c 4 Feb 2004 21:52:53 -0000 1.147 > % +++ msdosfs_vnops.c 12 Nov 2007 21:47:48 -0000 > % @@ -457,5 +457,7 @@ > % (error = VOP_ACCESS(ap->a_vp, VWRITE, cred, ap->a_td)))) > % return (error); > % +#if 0 > % if (vp->v_type != VDIR) { > % +#endif > % if ((pmp->pm_flags & MSDOSFSMNT_NOWIN95) == 0 && > % vap->va_atime.tv_sec != VNOVAL) { > > The main part of the fix is to just remove the special case for directories > which just breaks utimes() on directories. Even DOS and Windows never > had this brokenness. What DOS and Windows do specially for directories > is not update their modification time when their contents is changed. > FreeBSD is compatible with this, and the above special case is apparently > the result of trying too hard to be compatible with DOS and Windows. > > % @@ -463,4 +465,7 @@ > % unix2dostime(&vap->va_atime, &dep->de_ADate, > % NULL, NULL); > % + if (vp->v_type != VDIR) > % + dep->de_Attributes |= ATTR_ARCHIVE; > % + dep->de_flag |= DE_MODIFIED; > % } > % if (vap->va_mtime.tv_sec != VNOVAL) { > > Now that setting times on directories is unbroken, we have to be more > careful with the archive bit. In DOS and Windows, the archive bit is > meaningless for FAT* directories and is not set by any syscall that I > know of (unlike for ffs IIRC). The above avoids setting it for directories. > Not setting DE_MODIFIED is an unrelated micro-optimization (try harder not > to set it when we didn't change anything). > > % @@ -468,8 +473,11 @@ > % unix2dostime(&vap->va_mtime, &dep->de_MDate, > % &dep->de_MTime, NULL); > % + if (vp->v_type != VDIR) > % + dep->de_Attributes |= ATTR_ARCHIVE; > % + dep->de_flag |= DE_MODIFIED; > > Similarly for the mtime. > > % } > % - dep->de_Attributes |= ATTR_ARCHIVE; > % - dep->de_flag |= DE_MODIFIED; > > This was moved early. > > % +#if 0 > % } > % +#endif > > Finish hacking way the special case for directories. > > % } > % /* > % @@ -494,5 +502,5 @@ > % } > % } > % - return (deupdat(dep, 1)); > % + return (deupdat(dep, 0)); > > Remove an unrelated pessimization (a synchronous update where even an > asynchronous update is more than what is needed). Even ffs in Net/2 > didn't have the full pessimization here -- it pessimized SETATTR on > times but not on the more important ownership and permission attributes. > ffs still had the pessimization for times in 4.4BSD-Lite2, but FreeBSD > fixed it in 1998 (ufs_vnops.c 1.79; the fix was buried in a mega-commit > with a content-free log message :-(), and I fixed it in my version of > ffs long before that. msdosfs_setattr() is a little different from > ufs_setattr(); in particular, it does the (previous synchronous) update > for all successful calls while ufs_setattr() only ever did it for times, > so the pessimization has a wider scope in msdosfs. > > % } > > The above is only the least serious of the bugs in msdosfs_setattr() :-(. > With the above fix, plain touch works as well as possible -- it cannot > work perfectly since setting of atimes is not always supported. But > touch -r and more importantly, cp -p only work as well as possible for > root, since they use utimes() without the null timeptr arg that allows > plain touch to work. A non-null timeptr arg ends up normally requiring > root permissions for msdosfs where it normally doesn't require extra > permissions for ffs, because ownership requirements for the non-null case > cannot be satisfied by file systems that don't really support ownerships. > We fudge the ownerships and use weak checks on them in most places, but > for utimes() we use strict checks that almost always fail: from my old > version: > > % if (vap->va_flags != VNOVAL) { > % if (vp->v_mount->mnt_flag & MNT_RDONLY) > % return (EROFS); > % if (cred->cr_uid != pmp->pm_uid && > % (error = suser_cred(cred, PRISON_ROOT))) > % return (error); > > The implementation of this check has changed significantly in -current, > but its semantics and result havven't. The file must be owned by > someone (pmp->pm_uid), and no one else except root has permission for > utimes() with a non-null timeptr. This works right in ffs because the > file can normally be owned by its rightful owner, but in msdosfs the > owner is faked and there can be only one owner for the whole file > systems. I use owner root and group msdosfs for all msdosfs file > systems. The group permissions allow non-root users in group msdosfs > to do almost everything except this unimportant utimes() operation > to almost all msdosfs files. > > % /* > % * We are very inconsistent about handling unsupported > % * attributes. We ignored the access time and the > % * read and execute bits. We were strict for the other > % * attributes. > % * > % * Here we are strict, stricter than ufs in not allowing > % * users to attempt to set SF_SETTABLE bits or anyone to > % * set unsupported bits. However, we ignore attempts to > % * set ATTR_ARCHIVE for directories `cp -pr' from a more > % * sensible filesystem attempts it a lot. > % */ > > This comment is partly about the problem as it affects non-time attributes. > There is a problem with cp -p for almost all attributes, since msdosfs > doesn't really support them so cp -p from another file system that supports > more of them of them must either fail, or the failures must be silently or > unsilently ignored. cp -p unsilently ignores EPERM errors for *chown(), > but doesn't ignore any (?) other error (exit status != 0). This gives the > rather silly handling for typical errors for cp -p to msdosfs: chown() > usually fails at the syscall level but succeeds with a warning at the > shell level, but the less important utimes() usually fails at both levels. > > There is a related problem with file time granularity. It is the usual > case that the file system has a different granularity than the system > and other file systems. When the target has more granularity than the > source, it is usually impossible to duplicate the times. Having utimes() > fail when the times cannot be duplicated would be too strict in general, > but sometimes you would like to be strict. POSIX has only recently > started addressing this problem. (Old?) FAT with its 2-second granularity > has always been unable to represent the default 1-second granularity, and > has always handled this by silently truncating to the granularity that it > supports. > > My test directory for testing this mail shows another granularity problem: > (the file system is FAT32 with Win95 long names): after mkdir of it, a > stat utility on it gives: > > % file=z > % ... > % atime=Sat Aug 20 00:00:00 2011 (1313762400.0) > % ctime=Sat Aug 20 16:14:29 2011 (1313820869.740000000) > % mtime=Sat Aug 20 16:14:28 2011 (1313820868.0) > > This has the expected 2-second granularity for the mtime, but the other > times are strange: > - the atime is far in the past, and according to other tests has a > granularity of at least 200 seconds > - the ctime has a granularity of 100 msec. This differs significantly > from the mtime's granularity, so the ctime is up to 1.99 seconds in > advance of the mtime. This is probably a local bug -- I probably > don't have the fix for confusion between the ctime and the creation > time (birthtime). msdosfs only has a creation time so the ctime must > be faked and should usually be the same as the mtime. But how does > the creation time have more precision? > In other tests, creat() of a file sets the mtime and ctime reasonably, > but the atime remains with a fixed value far in the past. touch > advances the mtime correctly, but doesn't update the ctime. This is > consistent with displayed ctime actually being the creation time. > > >Yes, FAT file systems do not maintain a directory modify > >time. > > Er, yes they do... > > >(The original FAT12,16 structure didn't even have a > >modify time for the root dir.) > > ... except for the root directory, they don't, and this doesn't depend on > the the version -- there is no directory entry for the root file system, > so the modification time can't be stored in the usual place for root > directories, only. > > >Just like Windows. > > I normally use cygwin for managing file times on Windows, and touch and > cp -p work reasonably well with it. In particular, tuch updates directory > times. 15-25 years ago, I used DOS utitities to manage file times and > don't remember any problems with touch. Even DOS 2.1 (?) has a syscall > like utimes(). > > >This causes issues when a FAT fs is exported via NFS and > >someone was going to experiment with an "in memory only" > >modify time for dirs, to minimize caching issues, but I > >haven't heard back from them lately. > > "Memory only" times must never escape to userland. Linux has (had?) > file times in vfs, which makes many things easy, but old versions of > Linux did let them escape to userland. I ran fussy POSIX conformance > tests for times. The tests would succeed for non-POSIX file systems, > but only due to the times being in memory, and then unly while the > files were cached in memory. > > >Apparently Mac OS X chooses to update the modify time that > >exists on FAT32 file systems, but that isn't Windows compatible. > > Yes, it's a bug in Mac OS to be incompatile. However, I sometimes > wish for a mount option to control this. Similarly for weakening > of checking for attributes that cannot be set. Also, for file > times, there is another annoying problem which might be best handled > by a mount option: msdosfs file time change twice a year with daylight > saving. I sometimes back up msdosfs files to ffs where their times > don't change like this, and would like an easy way to stop the changes. > Moving across timezones might cause even more frequent changes, but this > doesn't affect me. > > Bruce