From owner-freebsd-fs Mon Aug 3 04:10:11 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id EAA08102 for freebsd-fs-outgoing; Mon, 3 Aug 1998 04:10:11 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from gw-nl1.philips.com (gw-nl1.philips.com [192.68.44.33]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id EAA08038 for ; Mon, 3 Aug 1998 04:10:07 -0700 (PDT) (envelope-from Jos.Backus@nl.origin-it.com) Received: from smtprelay-nl1.philips.com (localhost.philips.com [127.0.0.1]) by gw-nl1.philips.com with ESMTP id NAA29841 for ; Mon, 3 Aug 1998 13:09:56 +0200 (MEST) (envelope-from Jos.Backus@nl.origin-it.com) Received: from hal.mpn.cp.philips.com (hal.mpn.cp.philips.com [130.139.64.195]) by smtprelay-nl1.philips.com (8.8.5/8.6.10-1.2.2m-970826) with SMTP id NAA26853 for ; Mon, 3 Aug 1998 13:09:55 +0200 (MET DST) Received: (qmail 26090 invoked by uid 666); 3 Aug 1998 11:07:57 -0000 Message-ID: <19980803130757.D25064@hal.mpn.cp.philips.com> Date: Mon, 3 Aug 1998 13:07:57 +0200 From: Jos Backus To: freebsd-fs@FreeBSD.ORG Subject: Re: Trying to recover lost file References: <199807311235.IAA01340@wkstn4-208.lxr.georgetown.edu> <199807312116.OAA29689@usr09.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.1i In-Reply-To: <199807312116.OAA29689@usr09.primenet.com>; from Terry Lambert on Fri, Jul 31, 1998 at 09:16:23PM +0000 X-Files: The Truth is out there! Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Note that there's an interesting ufs analysis tool at http://www.pobox.com/~djb/software/ufsread-0.50.shar.gz (Caveat: I have only tried it under BSD/OS). -- Jos Backus _/ _/_/_/ "Reliability means never _/ _/ _/ having to say you're sorry." _/ _/_/_/ -- D. J. Bernstein _/ _/ _/ _/ Jos.Backus@nl.origin-it.com _/_/ _/_/_/ use Std::Disclaimer; To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Aug 3 06:05:08 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id GAA21889 for freebsd-fs-outgoing; Mon, 3 Aug 1998 06:05:08 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from wkstn4-208.lxr.georgetown.edu (wkstn4-208.lxr.georgetown.edu [141.161.4.208]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id GAA21884 for ; Mon, 3 Aug 1998 06:05:05 -0700 (PDT) (envelope-from mystify@wkstn4-208.lxr.georgetown.edu) Received: from wkstn4-208.lxr.georgetown.edu (localhost [127.0.0.1]) by wkstn4-208.lxr.georgetown.edu (8.8.8/8.8.8) with ESMTP id JAA10804; Mon, 3 Aug 1998 09:04:45 -0400 (EDT) (envelope-from mystify@wkstn4-208.lxr.georgetown.edu) Message-Id: <199808031304.JAA10804@wkstn4-208.lxr.georgetown.edu> Reply-to: Patrick Hartling To: Terry Lambert cc: mystify@wkstn4-208.lxr.georgetown.edu (Patrick Hartling), freebsd-fs@FreeBSD.ORG, mystify@wkstn4-208.lxr.georgetown.edu Subject: Re: Trying to recover lost file In-reply-to: Message from Terry Lambert of "Fri, 31 Jul 1998 21:16:23 -0000." <199807312116.OAA29689@usr09.primenet.com> Date: Mon, 03 Aug 1998 09:04:45 -0400 From: Patrick Hartling Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Terry Lambert wrote: } Please tell me you were mounted sync, and tell me you didn't create any } files on the drive after you did this! Yes, and yes. :) I know better than to mount important stuff async, but I am definitely kicking myself for gzip'ing it. Thank you very much for all the detailed information. I just got back from a mini weekend vacation and will be starting in on this recovery tonight. With all your information and Duncan Barclay's code, I hope to get my work back soon. I really appreciate your help. -Patrick Patrick L. Hartling | Research Assistant, ICEMT mystify@wkstn4-208.lxr.georgetown.edu | SE Lab - 1117 Black Engineering http://www.public.iastate.edu/~oz/ | http://www.icemt.iastate.edu/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Aug 3 08:03:50 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id IAA05941 for freebsd-fs-outgoing; Mon, 3 Aug 1998 08:03:50 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from postman.true.net (s1.admin.true.net [161.196.66.100]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id IAA05935 for ; Mon, 3 Aug 1998 08:03:48 -0700 (PDT) (envelope-from lem@cantv.net) Received: from s2.admin.true.net (mail.cantv.net [161.196.66.21]) by postman.true.net (8.8.7/8.6.12) with ESMTP id LAA21748; Mon, 3 Aug 1998 11:03:40 -0400 (VET) Received: from lem.cantv.net (root@localhost) by s2.admin.true.net (8.8.7/CS-R-1.4) with SMTP id LAA28331; Mon, 3 Aug 1998 11:02:45 -0400 (VET) X-BlackMail: ws-7.chacao-01.int.cantv.net, lem.cantv.net, lem@cantv.net, 200.44.44.23 X-Authenticated-Timestamp: 11:02:45(VET) on August 03, 1998 Message-Id: <3.0.5.32.19980803101622.008ac100@pop.cantv.net> X-Sender: lem@pop.cantv.net X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) Date: Mon, 03 Aug 1998 10:16:22 -0400 To: Duncan Barclay From: Luis Munoz Subject: Re: Trying to recover lost file Cc: (Patrick Hartling) , freebsd-fs@FreeBSD.ORG In-Reply-To: References: <199807312116.OAA29689@usr09.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org At 12:47 AM 01/08/1998 +0100, Duncan Barclay wrote: [snip] >Terry a big problem under FreeBSD is that it hoses the inode pretty >quickly. I know, I did the same thing to a chapter of my PhD thesis >a year or so ago. Found the inode as you described and it was all 0... [snip] I think many 'modern' UNIXes do. This happened to me on SunOS 4.1.[23] a few years ago. In my case (I lost a bunch of C files) it was a matter of reading the cylinder group with dd and searching with Perl. Since most files were under 8k, I found them contiguously. In SunOS, writes were organized in 8k blocks (more in some machines I think). If FreeBSD does the same then probably you have to do much less thinkering to assemble the files again. Regards (and luck!) -lem To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Aug 4 11:01:11 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA24653 for freebsd-fs-outgoing; Tue, 4 Aug 1998 11:01:11 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA24123; Tue, 4 Aug 1998 10:58:50 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.8.8/8.8.8) id NAA03021; Tue, 4 Aug 1998 13:58:33 -0400 (EDT) (envelope-from wollman) Date: Tue, 4 Aug 1998 13:58:33 -0400 (EDT) From: Garrett Wollman Message-Id: <199808041758.NAA03021@khavrinen.lcs.mit.edu> To: freebsd-fs@FreeBSD.ORG Cc: core@FreeBSD.ORG Subject: Exclusive locking for directory lookups? Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Does anybody remember why plain-jane directory lookups (i.e., not deleting or creating anything) require an exclusive lock on all the directory vnodes along the path? It would seem to be that only shared locks should be necessary in those cases... -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Aug 4 11:35:58 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA02221 for freebsd-fs-outgoing; Tue, 4 Aug 1998 11:35:58 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from wanadoo.fr (smtp-out-2.wanadoo.fr [193.252.19.69]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA02194 for ; Tue, 4 Aug 1998 11:35:47 -0700 (PDT) (envelope-from poipoi@famipow.com) From: poipoi@famipow.com Received: from aralia.wanadoo.fr [193.252.19.42] by wanadoo.fr for Paris Tue, 4 Aug 1998 20:35:18 +0200 (MET DST) Received: from qmailr@meaux10-169.abo.wanadoo.fr [164.138.6.169] by smtp.wanadoo.fr for Paris Tue, 4 Aug 1998 20:35:13 +0200 (MET DST) Received: (qmail 378 invoked by uid 501); 4 Aug 1998 17:57:05 -0000 Message-ID: <19980804175705.377.qmail@hwi.poi.org> Subject: file hole ? To: freebsd-fs@FreeBSD.ORG Date: Tue, 4 Aug 1998 19:57:05 +0200 (MET DST) Cc: linux-fsdevel@vger.rutgers.edu, poipoi@famipow.com (moi) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org hi i want to know how to handle file hole. for example, i have a 8k file. i do a seek at 20000 and write a byte. Does the fs alloc every block to store 20001 bytes ? yes ? but its a space wasting... no ? but when the user will fill the hole (writing from 8192 to 20000), my fs will perhaps be full and i have to reject the write operation... what is the standard (good?) behaviour ? and why (if possible) ? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Aug 4 12:25:05 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA09558 for freebsd-fs-outgoing; Tue, 4 Aug 1998 12:25:05 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from sparrow.websense.net (dial-127-tnt-btvt-02.ramp.together.net [209.91.2.127]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA09552 for ; Tue, 4 Aug 1998 12:25:02 -0700 (PDT) (envelope-from wstearns@pobox.com) Received: from localhost (wstearns@localhost [127.0.0.1]) by sparrow.websense.net (8.8.7/8.8.7) with SMTP id PAA12773; Tue, 4 Aug 1998 15:23:16 -0400 Date: Tue, 4 Aug 1998 15:23:15 -0400 (EDT) From: William Stearns X-Sender: wstearns@sparrow.websense.net To: moi cc: freebsd-fs@FreeBSD.ORG, linux-fsdevel@vger.rutgers.edu Subject: Re: file hole ? In-Reply-To: <19980804175705.377.qmail@hwi.poi.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 4 Aug 1998 poipoi@famipow.com wrote: > hi > > i want to know how to handle file hole. > for example, i have a 8k file. i do a seek at 20000 and write a byte. > > Does the fs alloc every block to store 20001 bytes ? > yes ? but its a space wasting... > no ? but when the user will fill the hole (writing from 8192 to 20000), I can't speak for freebsd, but my best understanding about Linux' ext2 filesystem is that yes, it supports what you're describing (commonly known as "sparse" files). Novell Netware does as well. I'm not sure whether this is a feature of the ext2 filesystem only or exists in Linux' other filesystem implementations. This means that the uninitialized blocks of the file will not take up physical space on disk. > my fs will perhaps be full and i have to reject the write operation... I would guess that's correct. On the other hand, if it didn't support sparse files, you would not even have been able to create the file in the first place. > what is the standard (good?) behaviour ? and why (if possible) ? I consider this the preferred behaviour. The case of running out of disk space is just that, a _disk_ _space_ problem; an error writing to a sparse file is only a symptom of the problem. In fact, handling sparse files delays the out-of-space condition, making it less likely to occur. Cheers, - Bill --------------------------------------------------------------------------- Unix _is_ user friendly. It's just very selective about who its friends are. And sometimes even best friends have fights. William Stearns (wstearns@pobox.com) Mason, buildkernel, and named2hosts are at: http://www.pobox.com/~wstearns --------------------------------------------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Aug 4 13:13:56 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id NAA18277 for freebsd-fs-outgoing; Tue, 4 Aug 1998 13:13:56 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from www.schell.de ([195.20.238.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA18254 for ; Tue, 4 Aug 1998 13:13:49 -0700 (PDT) (envelope-from sas@schell.de) Received: from guerilla.foo.bar (hennen15.iserlohn.netsurf.de [194.195.194.207]) by www.schell.de (8.9.0/8.9.0) with ESMTP id WAA25522; Tue, 4 Aug 1998 22:13:29 +0200 Received: from localhost (localhost.foo.bar [127.0.0.1]) by guerilla.foo.bar (8.9.1/8.9.1) with SMTP id WAA13809; Tue, 4 Aug 1998 22:13:22 +0200 (CEST) Date: Tue, 4 Aug 1998 22:13:22 +0200 (CEST) From: Sascha Schumann To: moi cc: freebsd-fs@FreeBSD.ORG, linux-fsdevel@vger.rutgers.edu Subject: Re: file hole ? In-Reply-To: <19980804175705.377.qmail@hwi.poi.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org man 2 lseek: "The lseek() function allows the file offset to be set beyond the end of the existing end-of-file of the file. If data is later written at this point, subsequent reads of the data in the gap return bytes of zeros (until data is actually written into the gap). ..." Most fs' I know (including UFS and ext2fs) preallocate blocks before data is written, so you will never have this kind of file fragmentation. HTH, Sascha On Tue, 4 Aug 1998 poipoi@famipow.com wrote: > hi > > i want to know how to handle file hole. > for example, i have a 8k file. i do a seek at 20000 and write a byte. > > Does the fs alloc every block to store 20001 bytes ? > yes ? but its a space wasting... > no ? but when the user will fill the hole (writing from 8192 to 20000), > my fs will perhaps be full and i have to reject the write operation... > > what is the standard (good?) behaviour ? and why (if possible) ? > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Aug 4 15:28:51 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id PAA07243 for freebsd-fs-outgoing; Tue, 4 Aug 1998 15:28:51 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA07219 for ; Tue, 4 Aug 1998 15:28:43 -0700 (PDT) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id PAA01978; Tue, 4 Aug 1998 15:28:16 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp01.primenet.com, id smtpd001928; Tue Aug 4 15:28:09 1998 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id PAA06764; Tue, 4 Aug 1998 15:28:05 -0700 (MST) From: Terry Lambert Message-Id: <199808042228.PAA06764@usr07.primenet.com> Subject: Re: file hole ? To: poipoi@famipow.com Date: Tue, 4 Aug 1998 22:27:59 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG, linux-fsdevel@vger.rutgers.edu In-Reply-To: <19980804175705.377.qmail@hwi.poi.org> from "poipoi@famipow.com" at Aug 4, 98 07:57:05 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > i want to know how to handle file hole. > for example, i have a 8k file. i do a seek at 20000 and write a byte. > > Does the fs alloc every block to store 20001 bytes ? Depends on the FS. Most UNIX FS's do not. MSDOSFS on UNIX would have to, because there is no concept of a sparse file in the FS's design. > yes ? but its a space wasting... Generally "yes" for Linux or FreeBSD using their native FS's. > no ? but when the user will fill the hole (writing from 8192 to 20000), > my fs will perhaps be full and i have to reject the write operation... You can put 10 pounds of manure in a 5 pound bag. That would be a blivet. > what is the standard (good?) behaviour ? To supprt sparse files (leave holes, unless specifically asked to fill them in). > and why (if possible) ? Because you can explicitly fill the holes in if you want if the default is to leave holes, but you can't explicitly cause holes if the default is to fill them in. 8-). Consider the case of a sparsely filled database that uses a perfect hash to locate all its records. The file containing the records is sparsely used, and the majority of blocks backing the file would be wasted, unless the underlying filesystem supported sparse files. If you want to "fill in the file" so that the system will "commit" to having backing for all of the possible records in the file, you should use statfs(2) and use f_bsize to determine the block size. You should them multiply f_bsize by f_bavail (or f_bsize, if you are root and are willing to overfill the disk, damaging performance), and if that number is larger than the file you want to allocate, then there is sufficient space for the allocation. If you are concerned with the possibility that other users may consume the space after the seek/write, but before you have filled in the subsequent blocks, and therefore want the blocks preallocated (thus wasting the space for an indeterminate amount of time), you should do the following: struct statfs sfsb; char *zero = ""; long i; long count; int fd; statfs( PATH_TO_FS, &sfsb); count = (SIZE_WANTED_IN_BYTES + sfsb.f_bsize - 1) / sfsb.f_bsize; if( count > sfsb.f_bfree || (getuid() != 0 && count > sfsb.f_bavail)) { fprintf( stderr, "You maniac!\007\n"); exit( 1); } ... create the file, open it on fd ... /* * I work for Quantum; the only good disk is a full disk */. lseek( fd, sfsb.f_bsize - 1, SEEK_SET); /* we fear frags*/ for( i = 0; i < count; i++) { lseek( fd, sfsb.f_bsize, SEEK_CUR); write( fd, zero, 1); /* spam!*/ } printf( "Success: disk spammed, per your instructions\n"); exit( 0); Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Aug 4 16:20:33 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA14386 for freebsd-fs-outgoing; Tue, 4 Aug 1998 16:20:33 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA14151; Tue, 4 Aug 1998 16:18:43 -0700 (PDT) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id QAA04617; Tue, 4 Aug 1998 16:18:32 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp03.primenet.com, id smtpd004591; Tue Aug 4 16:18:28 1998 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id QAA11288; Tue, 4 Aug 1998 16:18:25 -0700 (MST) From: Terry Lambert Message-Id: <199808042318.QAA11288@usr07.primenet.com> Subject: Re: Exclusive locking for directory lookups? To: wollman@khavrinen.lcs.mit.edu (Garrett Wollman) Date: Tue, 4 Aug 1998 23:18:25 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG, core@FreeBSD.ORG In-Reply-To: <199808041758.NAA03021@khavrinen.lcs.mit.edu> from "Garrett Wollman" at Aug 4, 98 01:58:33 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Does anybody remember why plain-jane directory lookups (i.e., not > deleting or creating anything) require an exclusive lock on all the > directory vnodes along the path? It would seem to be that only shared > locks should be necessary in those cases... Because there is no flag from namei() to indicate whether the terminal path component is going to be modified or not. This is more a symptom of the way namei() is implemented; it can't support inheritance of such a flag (or inheritance of a POSIX namespace escape ("///") for the same reason. The locking is a tail-chase down the tree -- that is, the lock is one-behind (parent and child) and is not held to the root. The real question is whether the race this is protecting against still exists with a unified VM and buffer cache... in which case, the you could reexamine the need for it (for this to work, you would need my patch to have namei() to return the EEXISTS, instead of duplicating the code every place you wanted a lookup to fail if the target exists, ie: the create/rename target cases, etc. -- you need to distinguis internal vs. external error return for this case). My gut feeling is that it is still neccessary, even though the race it was intended to protect against was a VM and buffer cache coherency with multiple accesses in the "write entry" case, mostly because of the late buffer mapping. I could be wrong here, though. In any case, that would mean you would need to add a flag to indicate a terminal component lookup to VOP_LOOKUP. This is somewhat problematic, because an underlying FS is permitted to eat as many components as it wants to, according to the design. To get around this, the idea that it is the terminal component would have to be indicated by the non-existance of a "next component". To implement this approach would require pre-parsing the path into components. The easiest way to do this would be to keep a seperate "total length" in the path component buffer, and replace the path sperators with NUL, treating it as a pre-strtok'ed string. If you go this route, consider providing access macros as well, making the underlying FS advance cn_nameptr (if it consumes extra components), and in general making the structure opaque enough that we could support multiple namespaces (the current VFAT short name binding and assumption of ISO 8859-1 character set instead of Unicode is broken). The underlying VOP_LOOKUP for the "writing an entry" case would use the accessor macro to ask for the start of the next component; if it got a NULL back, it would know it was terminal, and that it needed to lock (handling the EEXISTS case in namei() lets you avoid this lock, if the lookup would succeed). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Aug 6 10:12:36 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA10565 for freebsd-fs-outgoing; Thu, 6 Aug 1998 10:12:36 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA10533 for ; Thu, 6 Aug 1998 10:12:21 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.8.8/8.8.8) id NAA14470; Thu, 6 Aug 1998 13:12:03 -0400 (EDT) (envelope-from wollman) Date: Thu, 6 Aug 1998 13:12:03 -0400 (EDT) From: Garrett Wollman Message-Id: <199808061712.NAA14470@khavrinen.lcs.mit.edu> To: freebsd-fs@FreeBSD.ORG Cc: dillon@backplane.com Subject: Filesystem locking during lookups Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I ran a modified kernel whose lookup() only uses exclusive locks for CREATE and DELETE namei operations yesterday. So far as I can tell, this is a perfectly reasonable thing to do. However, it does seriously unbalance the relative priorities of lookups vis-a-vis deletes, and this might be considered a security issue. A more complete implementation would use shared locks all the time, and then upgrade to exclusive only when actually necessary. One particular side-effect of this change is that my five-minute dexpire took an hour and a half yesterday noon. I believe this effectively rebuts Matt Dillon's contention that the vfs name cache will prevent problems of lock contention. (And indeed, if one inspects the code, it is clear that the vnode has to locked first, since the name cache is called by VOP_LOOKUP, which requires a locked vnode.) In 30 seconds, my machine made 1475 calls to namei, of which 209 were root-relative. Even with these changes, my machine still deadlocked last night (although it lasted longer than it did the night before), so I would say that there is a fair amount of filesystem contention going on. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Aug 6 14:51:59 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA15841 for freebsd-fs-outgoing; Thu, 6 Aug 1998 14:51:59 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA15646 for ; Thu, 6 Aug 1998 14:51:16 -0700 (PDT) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.8.8/8.8.8) id OAA22216; Thu, 6 Aug 1998 14:50:49 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp04.primenet.com, id smtpd022170; Thu Aug 6 14:50:40 1998 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id OAA25238; Thu, 6 Aug 1998 14:50:35 -0700 (MST) From: Terry Lambert Message-Id: <199808062150.OAA25238@usr07.primenet.com> Subject: Re: Filesystem locking during lookups To: wollman@khavrinen.lcs.mit.edu (Garrett Wollman) Date: Thu, 6 Aug 1998 21:50:34 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG, dillon@backplane.com In-Reply-To: <199808061712.NAA14470@khavrinen.lcs.mit.edu> from "Garrett Wollman" at Aug 6, 98 01:12:03 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I ran a modified kernel whose lookup() only uses exclusive locks for > CREATE and DELETE namei operations yesterday. So far as I can tell, > this is a perfectly reasonable thing to do. However, it does > seriously unbalance the relative priorities of lookups vis-a-vis > deletes, and this might be considered a security issue. A more > complete implementation would use shared locks all the time, and then > upgrade to exclusive only when actually necessary. > > One particular side-effect of this change is that my five-minute > dexpire took an hour and a half yesterday noon. I believe this > effectively rebuts Matt Dillon's contention that the vfs name cache > will prevent problems of lock contention. (And indeed, if one > inspects the code, it is clear that the vnode has to locked first, > since the name cache is called by VOP_LOOKUP, which requires a locked > vnode.) > > In 30 seconds, my machine made 1475 calls to namei, of which 209 were > root-relative. Even with these changes, my machine still deadlocked > last night (although it lasted longer than it did the night before), > so I would say that there is a fair amount of filesystem contention > going on. Are you running soft updates? I have a *very* hard time believing that the lock is being held all the way up to root, intentionally. There's simply no code to actually reverse traverse freeing the locks. Perhaps locked vnodes are being cached with the lock held in some way? Is your news spool mounted right off of root? I'm also very interested in your root vs. relative path count; can you say what the single largest culprit for this is? Is there a single largest culprit responsible for absolute path lookup? Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Aug 6 23:14:07 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id XAA03153 for freebsd-fs-outgoing; Thu, 6 Aug 1998 23:14:07 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from wanadoo.fr (smtp-out-2.wanadoo.fr [193.252.19.69]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id XAA03141 for ; Thu, 6 Aug 1998 23:14:05 -0700 (PDT) (envelope-from poipoi@famipow.com) From: poipoi@famipow.com Received: from aralia.wanadoo.fr [193.252.19.42] by wanadoo.fr for Paris Fri, 7 Aug 1998 08:13:19 +0200 (MET DST) Received: from qmailr@cergy13-44.abo.wanadoo.fr [164.138.1.44] by smtp.wanadoo.fr for Paris Fri, 7 Aug 1998 08:13:14 +0200 (MET DST) Received: (qmail 269 invoked by uid 501); 7 Aug 1998 06:15:58 -0000 Message-ID: <19980807061557.268.qmail@hwi.poi.org> Subject: inode version? To: freebsd-fs@FreeBSD.ORG Date: Fri, 7 Aug 1998 08:15:57 +0200 (MET DST) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org hi in vfs implementation, i see a inode version but i dont understand what is it... according to some books, it is used for NFS but i understand how and why. anyone can help me To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Aug 7 00:28:16 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA15880 for freebsd-fs-outgoing; Fri, 7 Aug 1998 00:28:16 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA15796 for ; Fri, 7 Aug 1998 00:28:08 -0700 (PDT) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id AAA28917; Fri, 7 Aug 1998 00:27:47 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp03.primenet.com, id smtpd028874; Fri Aug 7 00:27:37 1998 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id AAA22692; Fri, 7 Aug 1998 00:27:35 -0700 (MST) From: Terry Lambert Message-Id: <199808070727.AAA22692@usr08.primenet.com> Subject: Re: inode version? To: poipoi@famipow.com Date: Fri, 7 Aug 1998 07:27:34 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG In-Reply-To: <19980807061557.268.qmail@hwi.poi.org> from "poipoi@famipow.com" at Aug 7, 98 08:15:57 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > in vfs implementation, i see a inode version but i dont understand what > is it... according to some books, it is used for NFS but i understand > how and why. It is used to tell the difference between an inode that is rereferenced in a stateless protocol and an inode that is rereferenced after it has been reused. If it has been reused, it's just as valid as before, but if the generation count is different, it's stale. This is used to recover from the case where an inode reference count goes to zero on an NFS server when the file is held open over a delete. Because you can't set VEXEC on the server from the client, in a memory overcommit architecture (where the file is used as a swap store), a file may "disappear" out from under you. If this happens, the generation number changes, and you can tell that it happened, and fail more gracefully than you would have oherwise. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Aug 8 10:49:05 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA05842 for freebsd-fs-outgoing; Sat, 8 Aug 1998 10:49:05 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA05837 for ; Sat, 8 Aug 1998 10:49:04 -0700 (PDT) (envelope-from dillon@backplane.com) Received: (dillon@localhost) by apollo.backplane.com (8.8.8/8.6.5) id KAA19458; Sat, 8 Aug 1998 10:48:43 -0700 (PDT) Date: Sat, 8 Aug 1998 10:48:43 -0700 (PDT) From: Matthew Dillon Message-Id: <199808081748.KAA19458@apollo.backplane.com> To: Terry Lambert Cc: wollman@khavrinen.lcs.mit.edu (Garrett Wollman), freebsd-fs@FreeBSD.ORG Subject: Re: Filesystem locking during lookups Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Well, I've seen cascade crashes where locks are held all the way to root, but ultimately it comes down to a deadlock on some deep file or directory. But, again, it's doubtful that namei has anything to do with it. It's more likely to be a deadlock in the filesystem vs VM code. I changed Diablo's feeder code on my test machine to avoid appending to the history file on each write (instead, it appends a large block of zero's and then allocates history records out of the block). nntp3.ba.best.com has been the most stable it's been yet... up 9 days now. However, I am still seeing significant VM/fs corruption where the mmaping of files being actively appended to (in this case, the spool files) can cause physical corruption of the file. -Matt :> :> In 30 seconds, my machine made 1475 calls to namei, of which 209 were :> root-relative. Even with these changes, my machine still deadlocked :> last night (although it lasted longer than it did the night before), :> so I would say that there is a fair amount of filesystem contention :> going on. : :Are you running soft updates? : :I have a *very* hard time believing that the lock is being held :all the way up to root, intentionally. There's simply no code :to actually reverse traverse freeing the locks. : :Perhaps locked vnodes are being cached with the lock held in some :way? : :Is your news spool mounted right off of root? : : :I'm also very interested in your root vs. relative path count; can you :say what the single largest culprit for this is? Is there a single :largest culprit responsible for absolute path lookup? : : : Terry Lambert : terry@lambert.org :--- :Any opinions in this posting are my own and not those of my present :or previous employers. : Matthew Dillon Engineering, HiWay Technologies, Inc. & BEST Internet Communications (Please include original email in any response) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Aug 8 16:30:09 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA07644 for freebsd-fs-outgoing; Sat, 8 Aug 1998 16:30:09 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA07635 for ; Sat, 8 Aug 1998 16:30:05 -0700 (PDT) (envelope-from michaelh@cet.co.jp) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.8/CET-v2.2) with SMTP id XAA08204; Sat, 8 Aug 1998 23:28:35 GMT Date: Sun, 9 Aug 1998 08:28:34 +0900 (JST) From: Michael Hancock To: Constantine Sapuntzakis cc: freebsd-fs@FreeBSD.ORG Subject: Re: VFS vrele fixes In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org [fs list added to cc] Constantine, I'm still not clear on the best approach. When I started working on VFS_VRELE, I started to not like where it was going so I shelved that approach. Sound like an interesting approach though. I'll have to look into it more some other time after I've popped a few more big projects off my stack. Regards, Mike On 7 Aug 1998, Constantine Sapuntzakis wrote: > Hi Mike, > > Here's another idea for giving file systems control over their vnode > management. > > Change VOP_INACTIVE so the file system is responsible for putting > vnodes on the free list (rather than automatically doing it in vrele, > vput). > > Change VOP_RECLAIM so that the file system is responsible for cleaning > out the vnode (i.e. does a vgone). > > This adds more code to each file system, but much of this code could > probably be rolled into common library functions that the file system > could call. > > This way, you can build a file system that is totally independent of > the internal vnode management mechanism. And you no longer need > VFS_VRELE. > > > -Costa > > Michael Hancock writes: > > > I've created a web page for some vfs fixes I'm working on. Comments are > > appreciated. The changes will relieve general vnode management from all > > fs implementations. Taken as a whole, the scope of the changes is fairly > > large. > > > > http://www.freebsd.org/~mch/vfs1.html > > > > Regards, > > > > > > Mike Hancock > -- michaelh@cet.co.jp http://www.cet.co.jp CET Inc., Daiichi Kasuya BLDG 8F, 2-5-12 Higashi Shinbashi, Minato-ku, Tokyo 105 Japan Tel: +81-3-3437-1761 Fax: +81-3-3437-1766 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message