From owner-freebsd-fs Mon Nov 11 11: 0:58 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E51D37B404 for ; Mon, 11 Nov 2002 11:00:57 -0800 (PST) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0306743E3B for ; Mon, 11 Nov 2002 11:00:56 -0800 (PST) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id gABJ0ux3073778 for ; Mon, 11 Nov 2002 11:00:56 -0800 (PST) (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.12.6/8.12.6/Submit) id gABJ0uAn073760 for fs@freebsd.org; Mon, 11 Nov 2002 11:00:56 -0800 (PST) Date: Mon, 11 Nov 2002 11:00:56 -0800 (PST) Message-Id: <200211111900.gABJ0uAn073760@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: fs@FreeBSD.org Subject: Current problem reports assigned to you Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- a [2000/10/06] kern/21807 fs [patches] Make System attribute correspon 1 problem total. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 1:48:28 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E55E37B438 for ; Tue, 12 Nov 2002 01:48:25 -0800 (PST) Received: from pohoda.cz (pohoda.pohoda.cz [194.228.111.151]) by mx1.FreeBSD.org (Postfix) with SMTP id 9296043E3B for ; Tue, 12 Nov 2002 01:48:23 -0800 (PST) (envelope-from plusik@pohoda.cz) Received: (qmail 21400 invoked from network); 12 Nov 2002 09:48:24 -0000 Received: from plusik@pohoda.cz by pohoda.cz by uid 500 with qmail-scanner-1.15 ( Clear:. Processed in 0.049096 secs); 12 lis 2002 09:48:24 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 12 Nov 2002 09:48:23 -0000 Date: Tue, 12 Nov 2002 10:48:23 +0100 (CET) From: Tomas Pluskal To: , , Subject: seeking help to rewrite the msdos filesystem Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hello, I believe that everybody here knows about the "slow msdosfs" problem, that is AFAIK caused by implementation without clustering. For me this is very annoying, because I use digital camera, and ZIP drive, and FAT on both of them. Speed is about 10 times lower than it could be.. I would like to rewrite the msdosfs driver to use clustering (in fact, I have chosen it as school project, so I have to do it anyway :). Is there anybody, who could spend few minutes and write me some information about how these filesystems are implemented, what should I read first, and what steps to follow to implement clustering ? I am ready to do the hard work :) Thanks Tomas Pluskal To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 3:10:34 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4894337B404 for ; Tue, 12 Nov 2002 03:10:29 -0800 (PST) Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6484143E42 for ; Tue, 12 Nov 2002 03:10:29 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0020.cvx21-bradley.dialup.earthlink.net ([209.179.192.20] helo=mindspring.com) by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 18BYw5-0005I5-00; Tue, 12 Nov 2002 03:10:26 -0800 Message-ID: <3DD0E002.914DA5EF@mindspring.com> Date: Tue, 12 Nov 2002 03:03:30 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Tomas Pluskal Cc: freebsd-fs@freebsd.org Subject: Re: seeking help to rewrite the msdos filesystem References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Tomas Pluskal wrote: > I believe that everybody here knows about the "slow msdosfs" problem, that > is AFAIK caused by implementation without clustering. No; mostly it's non-page aligned accesses, and the fact that serial access requires traversal of the entire file up to that point, and so becomes exponential, the further you get into the file. See: http://www.usenix.org/publications/library/proceedings/sf94/forin.html The intent of this paper was to make the DOS FS look good, by caching all FAT metadata, and disabling all normal OS caching in the FFS implementation. However, it does have a couple of good suggestions for speeding FAT file access up. The most useful is that you should cache the metadata information, so that you can traverse the list of FS blocks in memory, instead of on disk, and caching of all the directory information (for the same reasons). Because the directory entry is also the inode, in FAT, this has the effect of caching all the inodes, as well. If you want to get fancy, then you would implement the metadata cachine for file blocks as a btree, so that you could go to a given block in log2(# of blocks) compares, in memory, instead of spending a long time traversing sectors. > For me this is very annoying, because I use digital camera, and ZIP drive, > and FAT on both of them. Speed is about 10 times lower than it could be.. > I would like to rewrite the msdosfs driver to use clustering (in fact, I > have chosen it as school project, so I have to do it anyway :). > > Is there anybody, who could spend few minutes and write me some > information about how these filesystems are implemented, what should I > read first, and what steps to follow to implement clustering ? > I am ready to do the hard work :) I don't think the issue is actually clustering. If the cluster size is set, you really don't have a choice, since your block chain is hung off that, so it's not like FreeBSD goes out of its way to pessimize access. As far as alignment goes, make sure the MSDOSFS starts on a 4K boundary, and you should be OK; otherwise, every 4th one is going to span a page boundary, and you'll eat a page-in latency over it; this is less common these days, but it really depends on your ZIP disk partition layout (in most cases, there's an assumption of 64K per cylinder/head these days, and 64K is evenly divisible by 4K, which equals no alignment problem). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 5:41: 8 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9F43137B401 for ; Tue, 12 Nov 2002 05:41:07 -0800 (PST) Received: from server3.fastmail.fm (server3.fastmail.fm [209.61.187.56]) by mx1.FreeBSD.org (Postfix) with ESMTP id 205EA43E6E for ; Tue, 12 Nov 2002 05:41:07 -0800 (PST) (envelope-from wohl@chessclub.com) Received: from server3.fastmail.fm (localhost [127.0.0.1]) by fastmail.fm (Postfix) with ESMTP id 536DB2FD5D; Tue, 12 Nov 2002 07:41:03 -0600 (CST) Received: from 127.0.0.1 ([127.0.0.1] helo=server3.fastmail.fm) by fastmail.fm with SMTP; Tue, 12 Nov 2002 07:41:03 -0600 Received: by server3.fastmail.fm (Postfix, from userid 99) id 49BA02FD55; Tue, 12 Nov 2002 07:41:03 -0600 (CST) Content-Disposition: inline Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="ISO-8859-1" MIME-Version: 1.0 X-Mailer: MIME::Lite 1.2 (F2.6; T1.001; A1.48; B2.12; Q2.03) From: "Aaron Wohl" To: freebsd-fs@FreeBSD.ORG Date: Tue, 12 Nov 2002 07:41:03 -0600 X-Epoch: 1037108463 X-Sasl-enc: ozoPM9teTigDZf3q3Ml3gw Subject: 4.7 using current vinum to snapshot? Message-Id: <20021112134103.49BA02FD55@server3.fastmail.fm> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I am trying to use the current vinum in the stable release to do snapshoting. I setup a raid 1 let the added drive sync, stop the added drive, hook it up to a different volume, fsck it, mount it read only. It does work but I get a couple of error messages along the way. Is anyone else using vinum to do snapshoting? Is there some other order to do these steps in and not get the error messages? Im aware of the notes in the vinum documentation about future directions for snapshoting... ## starting here the two disks are setup as mirror 1 ## break the mirror and then mount the copy else where so it can be backeed up vinum detach cplex vinum setstate down cplex vinum setstate down cplex.s0 # ignore error Can't attach cplex to mir2: 1 in next step it works anyway vinum attach cplex mir2 fsck -y /dev/vinum/mir2 mount /dev/vinum/mir2 /mnt2 ... backup /mnt2 ## put the disk back into the mirror umount /dev/vinum/mir2 vinum stop mir2 vinum detach cplex vinum setstate obsolete cplex.s0 #ignore error 1 in next line vinum attach cplex mirx vinum start cplex.s0 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 8:11:29 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2F2B037B401; Tue, 12 Nov 2002 08:11:28 -0800 (PST) Received: from mail.eecs.harvard.edu (bowser.eecs.harvard.edu [140.247.60.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9A7E043E75; Tue, 12 Nov 2002 08:11:27 -0800 (PST) (envelope-from ellard@eecs.harvard.edu) Received: by mail.eecs.harvard.edu (Postfix, from userid 465) id 509D654C659; Tue, 12 Nov 2002 11:11:21 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.eecs.harvard.edu (Postfix) with ESMTP id 4947254C634; Tue, 12 Nov 2002 11:11:21 -0500 (EST) Date: Tue, 12 Nov 2002 11:11:21 -0500 (EST) From: Dan Ellard To: freebsd-fs@FreeBSD.ORG, Subject: how to control tagged queueing? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I'm experimenting with the effects of SCSI tagged queueing on file system performance. Is there any kind of global toggle somewhere in the kernel to turn tagged queueing on and off, and/or knob to limit the number of outstanding tags? Tagged queue management all seems to be done at the device level, and I haven't found hooks for controlling it at a higher level (but I thought I'd ask before running off to write something). I'm running 4.6.2p4, in case things have changed. (If there's a nicer interface in 4.7, I'll install it immediately!) Thanks, -Dan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 10: 6:10 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C631437B401; Tue, 12 Nov 2002 10:06:08 -0800 (PST) Received: from testmail.wolves.k12.mo.us (testmail.wolves.k12.mo.us [207.160.214.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 45ADC43E75; Tue, 12 Nov 2002 10:06:08 -0800 (PST) (envelope-from cdillon@wolves.k12.mo.us) Received: by testmail.wolves.k12.mo.us (Postfix, from userid 1001) id 017C81A951; Tue, 12 Nov 2002 12:06:06 -0600 (CST) Received: from localhost (localhost [127.0.0.1]) by testmail.wolves.k12.mo.us (Postfix) with ESMTP id F0B1F1A947; Tue, 12 Nov 2002 12:06:06 -0600 (CST) Date: Tue, 12 Nov 2002 12:06:06 -0600 (CST) From: Chris Dillon To: Dan Ellard Cc: freebsd-fs@FreeBSD.ORG, Subject: Re: how to control tagged queueing? In-Reply-To: Message-ID: <20021112120159.Y41695-100000@duey.wolves.k12.mo.us> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 12 Nov 2002, Dan Ellard wrote: > I'm experimenting with the effects of SCSI tagged queueing on file > system performance. Is there any kind of global toggle somewhere in > the kernel to turn tagged queueing on and off, and/or knob to limit > the number of outstanding tags? Tagged queue management all seems > to be done at the device level, and I haven't found hooks for > controlling it at a higher level (but I thought I'd ask before > running off to write something). > > I'm running 4.6.2p4, in case things have changed. (If there's a > nicer interface in 4.7, I'll install it immediately!) man camcontrol Specifically: camcontrol tags [device id] [generic args] [-N tags] [-q] [-v] camcontrol negotiate [device id] [generic args] [-T enable|disable] -- Chris Dillon - cdillon(at)wolves.k12.mo.us FreeBSD: The fastest and most stable server OS on the planet - Available for IA32 (Intel x86) and Alpha architectures - IA64, PowerPC, UltraSPARC, ARM, and S/390 under development - http://www.freebsd.org No trees were harmed in the composition of this message, although some electrons were mildly inconvenienced. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 10:40:58 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 07A4437B401 for ; Tue, 12 Nov 2002 10:40:54 -0800 (PST) Received: from gull.mail.pas.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F23B43E77 for ; Tue, 12 Nov 2002 10:40:53 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0174.cvx40-bradley.dialup.earthlink.net ([216.244.42.174] helo=mindspring.com) by gull.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 18Bfxr-00077v-00; Tue, 12 Nov 2002 10:40:43 -0800 Message-ID: <3DD14AD9.DF8D3580@mindspring.com> Date: Tue, 12 Nov 2002 10:39:21 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Tomas Pluskal Cc: freebsd-fs@freebsd.org Subject: Re: seeking help to rewrite the msdos filesystem References: <20021112134213.P32524-100000@localhost> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Tomas Pluskal wrote: > > No; mostly it's non-page aligned accesses, and the fact that > > serial access requires traversal of the entire file up to that > > point, and so becomes exponential, the further you get into > > the file. > > I believe that non-page aligned accesses would not make it 10 times > slower, so main problem lies in the traversal of the file ? You can lose a factor of 3 on this; basically, any 1K block you have will end up faulting the next page to get the last 512b of the last sector that starts at 512b from the end of the previous page. The issue here is one of doubled latency, every 4th block in the FS. In almost all optimization situations, after dealing with the obvious (e.g. repetition of operations that should not need to be repeated interior to loops, etc.), optimization will boil down to an issue of reducing latency on data propagation. > Is it a problem even if I am reading the file from beginning to end ? Yes. > Why? Sequential access has a specific optimization available, which is to use the last block to find the next block, but this fact is not cached. You could specifically attempt to speed up the sequential access by doing a cache-behind, and, when asking for a specific offest, knowing that the last block was at offset N, and knowing its contents are pointed to at XXX, so that when you asked for offset N+1, you could go directly to the in-core block, instead of linearly re-traversing. It's possible. The problem with doing this is: (1) It's not thre *now*, and (2) there is no guarantee that XXX is still in core. Cached data is hung off the vnode, not off the inode, and the only safe place to do the caching is in the inode, but there is no invalidation notification below the level of vnode, which is the caching object. In other words, the metadata is cached off the vnode of the mount point, not off the in-core image of the inode of the file you are reading from, and even if it were pointed to off the inode, it would be invalidated, LRU, off the vnode of the mount point, so it could disappear out from under the inode that's pointing to it. Hence the suggestion that metadata be explicitly cached, and thus explicitly referenced: and therefore the file allocation table area of interest, paged in from an in-core index. > > http://www.usenix.org/publications/library/proceedings/sf94/forin.html > > I'll read this through. Be aware that, like I said, the conclusions are a bit farcical and contrived. For all that, it presents a number of techniques that will be useful for speeding up FAT access by file offset. One of the issues here is that you will need "last close" notification, and you are probably also interested in data caching to have a preference for "open files, largest to smallest, within an LRU consistency window"; this is really not counter-intuitive, if you think about a shorter file with 5 opens on it or lots of random access, v.s a longer file with less frequent access, but both of which are better cached than a small file with a lot of access (effectively, this is a Laffer curve). Probably the best approach, if you have a limited cache, is to virtualize by offset in a btree, and over a certain depth, virtualize and retraverse with a lower persistancy cache for the remainder of the data. So, say, if you btree'ed an indec for a file, maybe the hard index would stop at every 4th block, or every 8th block. This would be enough to divide the offset of a new request by 8, then traverse 3 block to get to the data, rather than traversing 2^N+3 blocks for a btree depth of N. [This would be easier with a whiteboard, but I can draw an "ASCII art" picture, if necessary] > > However, it does have a couple of good suggestions for speeding > > FAT file access up. The most useful is that you should cache > > the metadata information, so that you can traverse the list of > > FS blocks in memory, instead of on disk, and caching of all the > > directory information (for the same reasons). Because the directory > > entry is also the inode, in FAT, this has the effect of caching all > > the inodes, as well. > > Is this metadata caching the main difference of FreeBSD from > linux/windows, that makes it so slow ? Windows certainly caches metadata. Windows also has other FS attributes that are rather odd, that will cause improved FS performance, at the epxense of other system performance. One specific example is that memory under Windows is obtained from a module called VMM32.VXD; everything which obtains memory from this module is required to register a "low memory condition" callback, which the system can call and say "give back those pages you can, give back, since I am low on memory". The interface is written such that the system can ask for memory back, and then each subsystem can return multiple pages back to the system on each call to it by the system. The VFAT32.VXD IF modules specifically gives memory back only a single page at a time, unfairly competing with other subsystems, and user prorgams. which the documentation for the system specifically recommends to return as much memory as possible on each request. If you are an FS developer writing code in the IFSMgr under Windows, and you don't know this, you will never be able to write FS code that is capable of competeing, side by side in the same system, against a VFAT32 FS under benchmark conditions (e.g. as a file server vs. "NetBench"). I would have to examine the Linux FAT FS code in more detail to know what they currently do, with regard to caching. When FS's were smaller, and Udo Walter was working on the code (say circa 1996), they did some aggressive caching, which they have probably tuned down, as Windows FS's and disk space have both increased. > > I don't think the issue is actually clustering. If the cluster > > size is set, you really don't have a choice, since your block > > chain is hung off that, so it's not like FreeBSD goes out of its > > way to pessimize access. > > I've been doing some experiments with iostat, and I've figured out that on > FAT filesystem the kernel is generating very small IO requests (cluster > size), but with ufs the request size is much bigger (128KB). > I have thought that this was the problem. This has more to do with sequential access. Technically, you can read a FAT cluster at a time instead of an FS block at a time, and you will achieve some multiplier on sequential access, but you will find that under load, that the fault rate for blocks will go up. Also, even if you read 64K at a time, you will end up LRU'ing out the data that you don't access. The issue is that UNIX files are accessed by offset, and FAT files are accessed by offset by chaining clusters from the start to the cluster of interest, and then reading blocks. Since you can't cache non-metadata references by ofset at the inode level, and can only do it at the inode level as a reference into the device cache hung off the device vnode (which can then be faulted), there's an impedence mismatch. If you force the caching in seperate memory for as much of the FAT as you can, then that buys you the cluster offsets, and you can cache the blocks containing the chaining information, as well. Rather than trying to design this for you on the fly, the best thing you can do is profile the code, so that you can compare how things are with how things end up. This is really needed, if you plan on using this as a school project, and want to write up papers. Also, the actual empirical choice on cache depth and other tradeoffs are going to be based on intended access patterns. So if you want to get a good paper out of it, you'll probably implement as many types of instrumentation, tunables, and attempts to speed up what's slow in profile, that you can. The more you do, the better your results will be, and me telling you what's worked in the past any more will bias your efforts unfairly; I'd rather just give you a direction to look in, and have you reach whatever conclusions you reach. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 14: 3:51 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 35BC437B401; Tue, 12 Nov 2002 14:03:50 -0800 (PST) Received: from mail.eecs.harvard.edu (bowser.eecs.harvard.edu [140.247.60.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id AC02543E4A; Tue, 12 Nov 2002 14:03:49 -0800 (PST) (envelope-from ellard@eecs.harvard.edu) Received: by mail.eecs.harvard.edu (Postfix, from userid 465) id 58C5454C6EE; Tue, 12 Nov 2002 17:03:43 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.eecs.harvard.edu (Postfix) with ESMTP id 404B354C630; Tue, 12 Nov 2002 17:03:43 -0500 (EST) Date: Tue, 12 Nov 2002 17:03:43 -0500 (EST) From: Dan Ellard To: Chris Dillon Cc: freebsd-fs@FreeBSD.ORG, Subject: Re: how to control tagged queueing? In-Reply-To: <20021112120159.Y41695-100000@duey.wolves.k12.mo.us> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 12 Nov 2002, Chris Dillon wrote: > > I'm experimenting with the effects of SCSI tagged queueing on file > > system performance. Is there any kind of global toggle somewhere in > > the kernel to turn tagged queueing on and off, and/or knob to limit > > the number of outstanding tags? Tagged queue management all seems > > to be done at the device level, and I haven't found hooks for > > controlling it at a higher level (but I thought I'd ask before > > running off to write something). > > > > I'm running 4.6.2p4, in case things have changed. (If there's a > > nicer interface in 4.7, I'll install it immediately!) > > man camcontrol > > Specifically: > > camcontrol tags [device id] [generic args] [-N tags] [-q] [-v] > camcontrol negotiate [device id] [generic args] [-T enable|disable] Thanks, that's exactly what I needed. And thanks to the other people who have responded! -Dan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 16:28:16 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 57DAD37B401 for ; Tue, 12 Nov 2002 16:28:15 -0800 (PST) Received: from HAL9000.homeunix.com (12-232-220-15.client.attbi.com [12.232.220.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id B44A143E3B for ; Tue, 12 Nov 2002 16:28:09 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id gAD0S82q004832; Tue, 12 Nov 2002 16:28:08 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id gAD0S84V004831; Tue, 12 Nov 2002 16:28:08 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Tue, 12 Nov 2002 16:28:07 -0800 From: David Schultz To: Terry Lambert Cc: Tomas Pluskal , freebsd-fs@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem Message-ID: <20021113002807.GA4711@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , Tomas Pluskal , freebsd-fs@FreeBSD.ORG References: <20021112134213.P32524-100000@localhost> <3DD14AD9.DF8D3580@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii:iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3DD14AD9.DF8D3580@mindspring.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Terry Lambert : > This has more to do with sequential access. Technically, you can > read a FAT cluster at a time instead of an FS block at a time, and > you will achieve some multiplier on sequential access, but you will > find that under load, that the fault rate for blocks will go up. > > Also, even if you read 64K at a time, you will end up LRU'ing out > the data that you don't access. > > The issue is that UNIX files are accessed by offset, and FAT files > are accessed by offset by chaining clusters from the start to the > cluster of interest, and then reading blocks. Few people use FAT filesystems under heavy load as they do UFS. Basically, I think what he wants to do is speed up sequential reads for a single process doing, say, digital video editing. On a FAT FS that is relatively free of fragmentation, naïve read-ahead is likely to improve performance for this type of load, even though the next logical block in the file might not be the next physical block on the disk. IIRC, SMARTDRV does this. This approach is optimizing for the single-user case, but if you have several people using a single FAT FS at a time, you have much bigger problems. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 18:36:49 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 183C837B401 for ; Tue, 12 Nov 2002 18:36:48 -0800 (PST) Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by mx1.FreeBSD.org (Postfix) with ESMTP id B669943E6E for ; Tue, 12 Nov 2002 18:36:46 -0800 (PST) (envelope-from grog@lemis.com) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id A21B5518FF; Wed, 13 Nov 2002 13:06:44 +1030 (CST) Date: Wed, 13 Nov 2002 13:06:44 +1030 From: Greg 'groggy' Lehey To: Aaron Wohl Cc: freebsd-fs@FreeBSD.ORG Subject: Re: 4.7 using current vinum to snapshot? Message-ID: <20021113023644.GC2919@wantadilla.lemis.com> References: <20021112134103.49BA02FD55@server3.fastmail.fm> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20021112134103.49BA02FD55@server3.fastmail.fm> User-Agent: Mutt/1.4i Organization: The FreeBSD Project Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.FreeBSD.org/ X-PGP-Fingerprint: 9A1B 8202 BCCE B846 F92F 09AC 22E6 F290 507A 4223 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tuesday, 12 November 2002 at 7:41:03 -0600, Aaron Wohl wrote: > I am trying to use the current vinum in the stable release to do > snapshoting. I setup a raid 1 let the added drive sync, stop the added > drive, hook it up to a different volume, fsck it, mount it read only. > > It does work but I get a couple of error messages along the way. Is > anyone else using vinum to do snapshoting? Is there some other order to > do these steps in and not get the error messages? Im aware of the notes > in the vinum documentation about future directions for snapshoting... > > ## starting here the two disks are setup as mirror 1 > ## break the mirror and then mount the copy else where so it can be > backeed up > vinum detach cplex > vinum setstate down cplex > vinum setstate down cplex.s0 > # ignore error Can't attach cplex to mir2: 1 in next step it works > anyway > vinum attach cplex mir2 > fsck -y /dev/vinum/mir2 > mount /dev/vinum/mir2 /mnt2 > > ... backup /mnt2 > > ## put the disk back into the mirror > umount /dev/vinum/mir2 > vinum stop mir2 > vinum detach cplex > vinum setstate obsolete cplex.s0 > #ignore error 1 in next line > vinum attach cplex mirx > vinum start cplex.s0 It's very difficult to follow this description, especially since you ignore the error messages. Could you please take a look at the man page or the web site and send the information asked for there. Greg -- See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 12 23:35:27 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 64EFC37B401 for ; Tue, 12 Nov 2002 23:35:26 -0800 (PST) Received: from pintail.mail.pas.earthlink.net (pintail.mail.pas.earthlink.net [207.217.120.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01ACC43E3B for ; Tue, 12 Nov 2002 23:35:26 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0207.cvx21-bradley.dialup.earthlink.net ([209.179.192.207] helo=mindspring.com) by pintail.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 18Bs3W-00033D-00; Tue, 12 Nov 2002 23:35:23 -0800 Message-ID: <3DD20037.1D11546A@mindspring.com> Date: Tue, 12 Nov 2002 23:33:11 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz Cc: Tomas Pluskal , freebsd-fs@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem References: <20021112134213.P32524-100000@localhost> <3DD14AD9.DF8D3580@mindspring.com> <20021113002807.GA4711@HAL9000.homeunix.com> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org David Schultz wrote: > > The issue is that UNIX files are accessed by offset, and FAT files > > are accessed by offset by chaining clusters from the start to the > > cluster of interest, and then reading blocks. > = > Few people use FAT filesystems under heavy load as they do UFS. > Basically, I think what he wants to do is speed up sequential > reads for a single process doing, say, digital video editing. On > a FAT FS that is relatively free of fragmentation, na=EFve > read-ahead is likely to improve performance for this type of load, > even though the next logical block in the file might not be the > next physical block on the disk. IIRC, SMARTDRV does this. This > approach is optimizing for the single-user case, but if you have > several people using a single FAT FS at a time, you have much > bigger problems. That's why, in my first posting, I suggested that a one cluster reference cache-behind wasn't really enough to deal with the problem. FWIW, "multiuser" in this context could include multiple applications, such as a playback, a mixer, and an editor, so the "non-multiuser" argument for what you have to worry about on FAT is not a very good argument (or the cache-behind would be enough for sequential access). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 0:55:53 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A66B037B401 for ; Wed, 13 Nov 2002 00:55:52 -0800 (PST) Received: from pohoda.cz (pohoda.pohoda.cz [194.228.111.151]) by mx1.FreeBSD.org (Postfix) with SMTP id 11C7443E7B for ; Wed, 13 Nov 2002 00:55:51 -0800 (PST) (envelope-from plusik@pohoda.cz) Received: (qmail 13182 invoked from network); 13 Nov 2002 08:55:51 -0000 Received: from plusik@pohoda.cz by pohoda.cz by uid 500 with qmail-scanner-1.15 ( Clear:. Processed in 0.048324 secs); 13 lis 2002 08:55:51 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 13 Nov 2002 08:55:51 -0000 Date: Wed, 13 Nov 2002 09:55:51 +0100 (CET) From: Tomas Pluskal To: Terry Lambert Cc: Subject: Re: seeking help to rewrite the msdos filesystem Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org > > This has more to do with sequential access. Technically, you can > read a FAT cluster at a time instead of an FS block at a time, and > you will achieve some multiplier on sequential access, but you will > find that under load, that the fault rate for blocks will go up. When I read from my ZIP drive, according to iostat the request size is 2KB. When I run dd with 2KB request size: # dd if=/dev/afd0 of=/dev/null bs=2048 count=100 100+0 records in 100+0 records out 204800 bytes transferred in 2.127448 secs (96266 bytes/sec) If I understand this right, I can never get faster then 96KB/s with sequential access, when using 2KB requests ? It is quite slow :) Tomas Pluskal To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 2: 6:13 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C05637B401 for ; Wed, 13 Nov 2002 02:06:11 -0800 (PST) Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49]) by mx1.FreeBSD.org (Postfix) with ESMTP id 24F0E43E97 for ; Wed, 13 Nov 2002 02:06:08 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0207.cvx21-bradley.dialup.earthlink.net ([209.179.192.207] helo=mindspring.com) by scaup.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 18BuPN-0005JU-00; Wed, 13 Nov 2002 02:06:05 -0800 Message-ID: <3DD22326.74544EAF@mindspring.com> Date: Wed, 13 Nov 2002 02:02:14 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Tomas Pluskal Cc: freebsd-fs@freebsd.org Subject: Re: seeking help to rewrite the msdos filesystem References: <20021113094824.N1339-100000@localhost.localdomain> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Tomas Pluskal wrote: > > This has more to do with sequential access. Technically, you can > > read a FAT cluster at a time instead of an FS block at a time, and > > you will achieve some multiplier on sequential access, but you will > > find that under load, that the fault rate for blocks will go up. > > When I read from my ZIP drive, according to iostat the request size is > 2KB. When I run dd with 2KB request size: > > # dd if=/dev/afd0 of=/dev/null bs=2048 count=100 > 100+0 records in > 100+0 records out > 204800 bytes transferred in 2.127448 secs (96266 bytes/sec) > > If I understand this right, I can never get faster then 96KB/s with > sequential access, when using 2KB requests ? It is quite slow :) Uh... the way you are using it here doesn't involve MSDOSFS at all. What happens if you say: # dd if=/dev/afd0 of=/dev/null bs=2048 count=100 # dd if=/dev/afd0 of=/dev/null bs=2048 count=100 Does the second one complete out of cache, and therefore faster? # dd if=/dev/afd0 of=/dev/null bs=204800 count=1 # dd if=/dev/afd0 of=/dev/null bs=204800 count=1 Are the requests still 2K according to iostat? If so, is it because it's a device driver limitation, or a hardware limitation of the ZIP disks themselves? Does the second one complete out of cache, and therefore faster? - I'll assume that the answers to the above questions are, in order, "no, no, N/A", unless you want to contradict. Is this a SCSI ZIP disk? The fastest possible read time you can possibly get out of any disk is to read SCSI mode page 2, and read a track at a time, so that you avoid track-to-track seeks in the middle of reads, using tagged commands to interleave requests to amortize a single seek latency across all requests combined. - I'll assume that the answer is "no"; basically, you will have to learn to live with a 1.5 times seek latency per virtual "fixed size track" read. Assuming all that... Most likely, the 2K is because that's the underlying FS block size. There is no optimization for sequential reads in MSDOSFS because the block offset requires metadata access, which is going to cause a seek, and there's no sequential optimization (see msdosfs_bmap() in /sys/fs/msdosfs/msdosfs_vnops.c), for lack of available metadata (without another seek and read of the FAT table... unless it's cached; see pcbmap() in msdosfs_fat.c). You probably want an is_sequential() to avoid really, really pessimizing random I/O. You should also look at the large block comment above the manifest constants defined for fs_setcache() in denode.h; you can see that it tries to implement the "one behind" entry I was talking about previously, but that this doesn't really help you, so there are either multiple opens, directory traversals, or other things going on because of the application you are running. Finally, note that the cache for the entire FAT, or as much of the FAT, and it's locality asyou can afford, LRU'ed, is not mapped. per the MACH MSDOSFS paper reference. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 3: 5:19 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9850237B401 for ; Wed, 13 Nov 2002 03:05:18 -0800 (PST) Received: from HAL9000.homeunix.com (12-232-220-15.client.attbi.com [12.232.220.15]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0D7DC43E7B for ; Wed, 13 Nov 2002 03:05:18 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id gADB5G2q006642; Wed, 13 Nov 2002 03:05:16 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id gADB5GLe006641; Wed, 13 Nov 2002 03:05:16 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Wed, 13 Nov 2002 03:05:15 -0800 From: David Schultz To: Tomas Pluskal Cc: Terry Lambert , freebsd-fs@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem Message-ID: <20021113110515.GA6287@HAL9000.homeunix.com> Mail-Followup-To: Tomas Pluskal , Terry Lambert , freebsd-fs@FreeBSD.ORG References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Tomas Pluskal : > > This has more to do with sequential access. Technically, you can > > read a FAT cluster at a time instead of an FS block at a time, and > > you will achieve some multiplier on sequential access, but you will > > find that under load, that the fault rate for blocks will go up. > > When I read from my ZIP drive, according to iostat the request size is > 2KB. When I run dd with 2KB request size: > > # dd if=/dev/afd0 of=/dev/null bs=2048 count=100 > 100+0 records in > 100+0 records out > 204800 bytes transferred in 2.127448 secs (96266 bytes/sec) > > If I understand this right, I can never get faster then 96KB/s with > sequential access, when using 2KB requests ? It is quite slow :) Terry's main point, I think, is that if you just request extra blocks from the disk when you do a read without thinking about it, you will only be improving performance for a defragmented FAT FS with few concurrent accesses. In the presence of fragmentation, which FAT is highly susceptible to, it's likely that the read ahead will fetch the wrong block and merely waste time and space. It's not as simple as ``2K requests are slow, but 16K requests are fast.'' The crucial difference is that in FFS, once you've cached the inode and any indirect blocks, you know exactly how logical positions in the file map to physical disk locations. Moreover, blocks of a given file are usually grouped together, so clustering is a reasonable thing to do. In FAT, you have to fetch the first N-1 blocks of the file before you know where the Nth block is on disk. Reading ahead is a way of speculating, which may or may not be beneficial, as described above. That said, I wouldn't mind if someone was willing to fix up the msdosfs stuff. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 4: 5:37 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1063B37B401; Wed, 13 Nov 2002 04:05:34 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 65E1643E3B; Wed, 13 Nov 2002 04:05:32 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id XAA27605; Wed, 13 Nov 2002 23:05:21 +1100 Date: Wed, 13 Nov 2002 23:17:53 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Tomas Pluskal Cc: freebsd-fs@FreeBSD.ORG, , Subject: Re: seeking help to rewrite the msdos filesystem In-Reply-To: Message-ID: <20021113221729.N381-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 12 Nov 2002, Tomas Pluskal wrote: > I believe that everybody here knows about the "slow msdosfs" problem, that > is AFAIK caused by implementation without clustering. Which problem. msdosfs has a number of small problems. Mostly they don't matter. > For me this is very annoying, because I use digital camera, and ZIP drive, > and FAT on both of them. Speed is about 10 times lower than it could be.. ZIP drives have much larger speed problems thn msdosfs. msdosfs happens to be a good way to get the worst out of them. They have a minumum i/o overhead of 20 msec (at least for all the 100MB ones that I tried), so if you use msdosfs's minimum block size of 512 then their maximum speed is 25K/sec which is about 40 times slower than it could be. The default block size of 2K gives a speed which is about 10 times slower than it could be. The ffs default block size of 16K gives a speed which is only about 1.25 times slower than it could be. E.g.: %%% Script started on Wed Nov 13 22:13:53 2002 ttyv1:root@gamplex:/tmp> newfs /dev/afd0 /dev/afd0: 96.0MB (196608 sectors) block size 16384, fragment size 2048 using 4 cylinder groups of 24.02MB, 1537 blks, 3200 inodes. super-block backups (for fsck -b #) at: 32, 49216, 98400, 147584 newfs: ioctl (DIOCWDINFO): /dev/afd0: can't rewrite disk label: Operation not supported by device ttyv1:root@gamplex:/tmp> mount /dev/afd0 /mnt ttyv1:root@gamplex:/tmp> dd if=/dev/zero of=/mnt/zz bs=1m count=20 time umount /mnt 20+0 records in 20+0 records out 20971520 bytes transferred in 18.827154 secs (1113898 bytes/sec) ttyv1:root@gamplex:/tmp> time umount /mnt 0.29 real 0.00 user 0.02 sys ttyv1:root@gamplex:/tmp> newfs_msdos -b 16384 /dev/afd0 /dev/afd0: 196512 sectors in 6141 FAT16 clusters (16384 bytes/cluster) bps=512 spc=32 res=1 nft=2 rde=512 mid=0xf0 spf=24 spt=32 hds=64 hid=0 bsec=196608 ttyv1:root@gamplex:/tmp> mount -t msdosfs /dev/afd0 /mnt ttyv1:root@gamplex:/tmp> dd if=/dev/zero of=/mnt/zz bs=1m count=20 time umount /mnt 20+0 records in 20+0 records out 20971520 bytes transferred in 27.729786 secs (756281 bytes/sec) ttyv1:root@gamplex:/tmp> time umount /mnt 5.57 real 0.00 user 0.03 sys ttyv1:root@gamplex:/tmp> exit Script done on Wed Nov 13 22:16:06 2002 %%% The above "could be" calculations are based on a speed of 1000K/sec. My test drive can't quite reach this using raw reads with a block size of 64K, but ffs clusters the data so well that it exceeds this speed for writes. msdosfs with a block size of 16K achieves about 63% of this speed (not the 87.5% suggested by the naive calculations). My times are with some small improvements which I think don't affect the tests much (they affect latency more than throughput). With lots of small files (smaller than the block size), clustering doesn't makes even less difference; however, msdosfs doesn't support soft updates or async mounts so it it is about as slow as plain ffs (in my test of writing 1000 files of size 512, msdosfs is actually only 5 times slower than ffs with soft updates or async; plain ffs is about 7.5 times slower). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 4:33:26 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 42BC937B401; Wed, 13 Nov 2002 04:33:24 -0800 (PST) Received: from mailout09.sul.t-online.com (mailout09.sul.t-online.com [194.25.134.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7526F43E6E; Wed, 13 Nov 2002 04:33:13 -0800 (PST) (envelope-from Alexander@Leidinger.net) Received: from fwd06.sul.t-online.de by mailout09.sul.t-online.com with smtp id 18BwhZ-000206-08; Wed, 13 Nov 2002 13:33:01 +0100 Received: from Andro-Beta.Leidinger.net (520065502893-0001@[217.83.22.193]) by fmrl06.sul.t-online.com with esmtp id 18BwhL-0wnAJcC; Wed, 13 Nov 2002 13:32:47 +0100 Received: from Magelan.Leidinger.net (Magelan [192.168.1.1]) by Andro-Beta.Leidinger.net (8.12.6/8.12.6) with ESMTP id gADCWcqu003401; Wed, 13 Nov 2002 13:32:38 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from Magelan.Leidinger.net (netchild@localhost [127.0.0.1]) by Magelan.Leidinger.net (8.12.6/8.12.6) with SMTP id gADCX1dh001822; Wed, 13 Nov 2002 13:33:01 +0100 (CET) (envelope-from Alexander@Leidinger.net) Date: Wed, 13 Nov 2002 13:33:01 +0100 From: Alexander Leidinger To: Bruce Evans Cc: plusik@pohoda.cz, freebsd-fs@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG, freebsd-current@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem Message-Id: <20021113133301.767d8a4d.Alexander@Leidinger.net> In-Reply-To: <20021113221729.N381-100000@gamplex.bde.org> References: <20021113221729.N381-100000@gamplex.bde.org> X-Mailer: Sylpheed version 0.8.5claws (GTK+ 1.2.10; i386-portbld-freebsd5.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Sender: 520065502893-0001@t-dialin.net Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Wed, 13 Nov 2002 23:17:53 +1100 (EST) Bruce Evans wrote: > My times are with some small improvements which I think don't affect > the tests much (they affect latency more than throughput). With lots > of small files (smaller than the block size), clustering doesn't makes > even less difference; however, msdosfs doesn't support soft updates > or async mounts so it it is about as slow as plain ffs (in my test of > writing 1000 files of size 512, msdosfs is actually only 5 times slower > than ffs with soft updates or async; plain ffs is about 7.5 times slower). mtools feels faster (yes, no measurement, pure subjective observation). Bye, Alexander. -- If Bill Gates had a dime for every time a Windows box crashed... ...Oh, wait a minute, he already does. http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 4:46:43 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 991CF37B401 for ; Wed, 13 Nov 2002 04:46:41 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 779D343E6E for ; Wed, 13 Nov 2002 04:46:40 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id XAA30323; Wed, 13 Nov 2002 23:46:28 +1100 Date: Wed, 13 Nov 2002 23:59:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: David Schultz Cc: Terry Lambert , Tomas Pluskal , Subject: Re: seeking help to rewrite the msdos filesystem In-Reply-To: <20021113002807.GA4711@HAL9000.homeunix.com> Message-ID: <20021113232517.P381-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN Content-Transfer-Encoding: QUOTED-PRINTABLE Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, 12 Nov 2002, David Schultz wrote: > Thus spake Terry Lambert : > > This has more to do with sequential access. Technically, you can > > read a FAT cluster at a time instead of an FS block at a time, and > > you will achieve some multiplier on sequential access, but you will > > find that under load, that the fault rate for blocks will go up. FAST clusters _are_ FS blocks in msdosfs. > > Also, even if you read 64K at a time, you will end up LRU'ing out > > the data that you don't access. > > > > The issue is that UNIX files are accessed by offset, and FAT files > > are accessed by offset by chaining clusters from the start to the > > cluster of interest, and then reading blocks. > > Few people use FAT filesystems under heavy load as they do UFS. > Basically, I think what he wants to do is speed up sequential > reads for a single process doing, say, digital video editing. On I think so too. > a FAT FS that is relatively free of fragmentation, na=EFve > read-ahead is likely to improve performance for this type of load, > even though the next logical block in the file might not be the > next physical block on the disk. IIRC, SMARTDRV does this. This > approach is optimizing for the single-user case, but if you have > several people using a single FAT FS at a time, you have much > bigger problems. Strangely enough, msdosfs already does naive read-ahead. It uses essentially the old read-ahead code from the version of ffs that it was cloned from (approx. @(#)ufs_vnops.c 7.64 (Berkeley) 5/16/91 ("Net/2")). It doesn't do clustering, but clustering is relatively unimportant in many cases including (apparently) the one here. The problem here seems to be just that some drives don't have any significant buffering and/or have huge command overheads, so even the ffs default block size of 16K is too small. The msdosfs default block size of 2K for ZIP drives is far too small. Clustering increases the effective block size to 64K, which is large enough for most purposes, but mdosfs is missing the few lines of code needed to implement clustering, and read-ahead doesn't help since it is done in units of the too-small block size. This is an old problem, but mostly finished going away about 7 years when adequate buffering and/or firmware to manage it became normal in all ordinary disk drives. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 6:28:58 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F93137B401 for ; Wed, 13 Nov 2002 06:28:57 -0800 (PST) Received: from pohoda.cz (pohoda.pohoda.cz [194.228.111.151]) by mx1.FreeBSD.org (Postfix) with SMTP id C2FA643E88 for ; Wed, 13 Nov 2002 06:28:54 -0800 (PST) (envelope-from plusik@pohoda.cz) Received: (qmail 21586 invoked from network); 13 Nov 2002 13:41:26 -0000 Received: from plusik@pohoda.cz by pohoda.cz by uid 500 with qmail-scanner-1.15 ( Clear:. Processed in 0.050461 secs); 13 lis 2002 13:41:26 -0000 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 13 Nov 2002 13:41:26 -0000 Date: Wed, 13 Nov 2002 14:41:26 +0100 (CET) From: Tomas Pluskal To: Bruce Evans Cc: Terry Lambert , Subject: Re: seeking help to rewrite the msdos filesystem Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org > The problem here seems to be just that some drives don't have any > significant buffering and/or have huge command overheads, so even the > ffs default block size of 16K is too small. The msdosfs default block > size of 2K for ZIP drives is far too small. Clustering increases the > effective block size to 64K, which is large enough for most purposes, > but mdosfs is missing the few lines of code needed to implement > clustering, and read-ahead doesn't help since it is done in units of > the too-small block size. This is an old problem, but mostly finished > going away about 7 years when adequate buffering and/or firmware to > manage it became normal in all ordinary disk drives. Could you please write a liitle more about those "missing few lines of code" ? I think it is what I could do and what would help when using ZIP drives and digital cameras etc.. Thanks Tomas Pluskal To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 7: 6:44 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 19D1037B401 for ; Wed, 13 Nov 2002 07:06:44 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id A403743E75 for ; Wed, 13 Nov 2002 07:06:42 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id CAA06002; Thu, 14 Nov 2002 02:06:24 +1100 Date: Thu, 14 Nov 2002 02:18:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Tomas Pluskal Cc: Terry Lambert , Subject: Re: seeking help to rewrite the msdos filesystem In-Reply-To: Message-ID: <20021114020947.O6495-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Wed, 13 Nov 2002, Tomas Pluskal wrote: > > ... Clustering increases the > > effective block size to 64K, which is large enough for most purposes, > > but mdosfs is missing the few lines of code needed to implement > > clustering... > > Could you please write a liitle more about those "missing few > lines of code" ? I think it is what I could do and what would help when > using ZIP drives and digital cameras etc.. "grep -i cluster *.c" in code for other file systems. cd9660 is simplest -- it just has one cluster_read() instead of a bread(). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 13 7:57:32 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B115637B404 for ; Wed, 13 Nov 2002 07:57:30 -0800 (PST) Received: from d50118.upc-d.chello.nl (d50118.upc-d.chello.nl [213.46.50.118]) by mx1.FreeBSD.org (Postfix) with SMTP id 1436D43E3B for ; Wed, 13 Nov 2002 07:57:25 -0800 (PST) (envelope-from twildenberg@concentric.net) Received: from netcom.com (concentric.net [47.100.50.197]) by hotmail.com (8.11.6/8.11.6) with ESMTP id 27265 for ; Wed, 13 Nov 2002 15:57:29 +0000 From: "af00" To: "" Subject: Bullet proof bulk email friendly hosting & cheap mass email campaigns. X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 Date: Wed, 13 Nov 2002 15:57:29 +0000 Message-ID: <2114324777ivCiuhhevg1ruj@mailexcite.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org We are the marketing specialists www.host4bulk.com that provide cheap bullet proof bulk email friendly hosting for your website ($400 for one month of bullet proof hosting) and cheap bulk email campaigns ($200 for 1 million emails sent) As you may already know, many web hosting companies have Terms of Service (TOS) or Acceptable Use Policies (AUP) against the delivery of emails advertising or promoting your web site. If your web site host receives complaints or discovers that your web site has been advertised in email broadcasts, they may disconnect your account and shut down your web site. Our mission is to solve your problem and provide you with bulk email friendly hosting. You don't have to worry about your website being closed again. Adult and gambling sites welcomed. No set up fee. You may advertise your website by using your own resources or using 3rd party's service. However we can do all the advertising for your business. You just sit, relax and see how your income grows constantly. We guarantee the lowest prices on the web for our web hosting and bulk email campaigns. We only ask $200 us dollars for 1 million emails sent with your ad. We don't use duplicate emails. Our email base is up to date and it is updated weekly. Our current email data base contains over 50.000.000 emails sorted by various parameters to meet your specific needs. No competitors may offer this price. The lowest price you can find on the net is well over $500 for 1 million Don't make the mistake of bulk emailing directly to your website without bulletproof web hosting. Your web host will close your account and shut your site down in no time! No matter how long you have been with them, how much you are paying them, or how beautiful your site is. There are companies charging thousands for bulletproof web hosting and they can't keep you up and running like we can. If you host with us, your site will NOT BE SHUT DOWN due to complaints! Bulk email campaign together with bullet proof hosting will bring your business to success. Just imagine how many people will learn about your business or product at a really low price. Bulk email is considered to be the most effective way to advertise on the net. It is hundreds times effective than banner, solo ad and other campaigns. Once people use our service they always come back for more. We can always provide websites that use bulk email campaigns with our new reliable way to accept credit cards on the net without the need to open merchant account. You can start accepting credit card payments in second. It is totally free. Visit our website at http://www.host4bulk.com for more information and to order your bulk email hosting or/and email campaign. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 14 15:32:37 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4605137B401 for ; Thu, 14 Nov 2002 15:32:36 -0800 (PST) Received: from hotmail.com (f223.pav1.hotmail.com [64.4.31.223]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0AE5243E4A for ; Thu, 14 Nov 2002 15:32:36 -0800 (PST) (envelope-from johny122@hotmail.com) Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Thu, 14 Nov 2002 15:32:35 -0800 Received: from 216.209.86.50 by pv1fd.pav1.hotmail.msn.com with HTTP; Thu, 14 Nov 2002 23:32:35 GMT X-Originating-IP: [216.209.86.50] From: "john ashfield" To: freebsd-fs@FreeBSD.org Subject: ext2fs source Date: Thu, 14 Nov 2002 20:02:35 -0330 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 14 Nov 2002 23:32:35.0878 (UTC) FILETIME=[1CC7E460:01C28C36] Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hi all, Could someone tell me where to download ext2 filesystem driver source? I downloaded the source from the links below, but I could not compile successfully. http://savannah.gnu.org/cgi-bin/viewcvs/~checkout~/hurd/hurd/ext2fs/ http://ftp.ipv4.heanet.ie/pub/OpenBSD/src/sys/ufs/ext2fs/ Thanks _________________________________________________________________ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 15 11:11:39 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5518737B401 for ; Fri, 15 Nov 2002 11:11:38 -0800 (PST) Received: from vbook.express.ru (vbook.nc.express.ru [212.24.37.35]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4C76243E9C for ; Fri, 15 Nov 2002 11:11:30 -0800 (PST) (envelope-from vova@sw.ru) Received: from vova by vbook.express.ru with local (Exim 4.10) id 18Cls5-0000IW-00 for fs@freebsd.org; Fri, 15 Nov 2002 22:11:17 +0300 Subject: Question about not locked vnode in VOP_RENAME From: "Vladimir B. " Grebenschikov To: fs@freebsd.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: SWsoft Inc. Message-Id: <1037387476.1037.6.camel@vbook> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.1.2 (Preview Release) Date: 15 Nov 2002 22:11:17 +0300 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hi ppl Question about int VOP_RENAME(struct vnode *fdvp, struct vnode *fvp, struct componentname *fcnp, struct vnode *tdvp, struct vnode *tvp, struct componentname *tcnp); It gets fdvp unlocked. Why it is differ from other similar VOPs. What will happens if between VOP_LOOKUP and VOP_RENAME another VOP_LOOKUP will happens, say for for file removal in same directory ? Second lookup can destroy in-inode data (for ufs) saved by first lookup. It is seems that panics http://spitfire.velocet.net/pipermail/freebsd-stable/2002-January/025074.html http://docs.freebsd.org/mail/archive/1998/freebsd-current/19980913.freebsd-current.html are because this race. -- Vladimir B. Grebenschikov SWsoft Inc. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 15 20:21:46 2002 Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 931) id DA58F37B401; Fri, 15 Nov 2002 20:21:45 -0800 (PST) Date: Fri, 15 Nov 2002 20:21:45 -0800 From: Juli Mallett To: john ashfield Cc: freebsd-fs@FreeBSD.org Subject: Re: ext2fs source Message-ID: <20021115202145.A73247@FreeBSD.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: ; from johny122@hotmail.com on Thu, Nov 14, 2002 at 08:02:35PM -0330 Organisation: The FreeBSD Project X-Alternate-Addresses: , , , , X-Towel: Yes X-LiveJournal: flata, jmallett X-Negacore: Yes Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * De: john ashfield [ Data: 2002-11-14 ] [ Subjecte: ext2fs source ] > Hi all, > > Could someone tell me where to download ext2 filesystem driver source? I > downloaded the source from the links below, but I could not compile > successfully. It's in the FreeBSD source tree, in src/sys/gnu/ext2fs. Read about configuring your own kernel to find out how to build in EXT2FS support. > http://savannah.gnu.org/cgi-bin/viewcvs/~checkout~/hurd/hurd/ext2fs/ Not even close, this is for GNU/HURF. > http://ftp.ipv4.heanet.ie/pub/OpenBSD/src/sys/ufs/ext2fs/ And this is for OpenBSD, which is closer, except not. juli. -- Juli Mallett OpenDarwin, Mono, FreeBSD Developer. ircd-hybrid Developer, EFnet addict. FreeBSD on MIPS-Anything on FreeBSD. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 16 3:32:59 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D51A337B401 for ; Sat, 16 Nov 2002 03:32:57 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5118243E75 for ; Sat, 16 Nov 2002 03:32:56 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id WAA05947; Sat, 16 Nov 2002 22:32:46 +1100 Date: Sat, 16 Nov 2002 22:45:28 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: "Vladimir B. Grebenschikov" Cc: fs@FreeBSD.ORG Subject: Re: Question about not locked vnode in VOP_RENAME In-Reply-To: <1037387476.1037.6.camel@vbook> Message-ID: <20021116221131.Q18243-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 15 Nov 2002, Vladimir B. Grebenschikov wrote: > int > VOP_RENAME(struct vnode *fdvp, struct vnode *fvp, > struct componentname *fcnp, struct vnode *tdvp, > struct vnode *tvp, struct componentname *tcnp); > > It gets fdvp unlocked. Why it is differ from other similar VOPs. It must be left unlocked to avoid deadlock. > What will happens if between VOP_LOOKUP and VOP_RENAME > another VOP_LOOKUP will happens, say for for file removal in same > directory ? Nothing bad should happen. If applications race each other renaming and/or unlinking files, then the files may end up in unexpected places depending on who loses the races, but the results should not be worse because the races are lost inside of rename(2). E.g., in 2 interesting cases: (1) the "from" file gets completely unlinked (by another process). Then it will normally be relinked as the new "to" file. The result is the same as if the other process attempted to unlink it after it was renamed, except the unlink will fail with errno ENOENT. (2) the "from" file gets renamed to another file (possibly in a different directory) (by another process). Then the other file will normally be linked to the new "to" file (and it won't be unlinked by the rename). > Second lookup can destroy in-inode data (for ufs) saved by first lookup. Inodes can't be corrupted by changes to directory entries, except possibly as they reflect corruption of the directory tree. > It is seems that panics > http://spitfire.velocet.net/pipermail/freebsd-stable/2002-January/025074.html > http://docs.freebsd.org/mail/archive/1998/freebsd-current/19980913.freebsd-current.html > are because this race. The serious races in ufs_rename(), if any, are later. I think there is a serious one for the doingdirectory && newparent case. Then both the source and target need to be unlocked, so there is nothing to prevent arbitrary (non-corrupting) changes to the directory tree underneath us. I don't see how ufs_checkpath() + relookup() can handle all cases correctly (they should bail out if the directory tree changed too much). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message