From owner-freebsd-fs Sun Aug 25 00:30:53 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id AAA17081 for fs-outgoing; Sun, 25 Aug 1996 00:30:53 -0700 (PDT) Received: from mx.serv.net (mx.serv.net [199.201.191.10]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id AAA17065; Sun, 25 Aug 1996 00:30:36 -0700 (PDT) Received: from MindBender.serv.net by mx.serv.net (8.7.5/SERV Revision: 2.30 † id AAA15845; Sun, 25 Aug 1996 00:30:42 -0700 (PDT) Received: from localhost.HeadCandy.com (michaelv@localhost.HeadCandy.com [127.0.0.1]) by MindBender.serv.net (8.7.5/8.7.3) with SMTP id AAA07494; Sun, 25 Aug 1996 00:30:21 -0700 (PDT) Message-Id: <199608250730.AAA07494@MindBender.serv.net> X-Authentication-Warning: MindBender.serv.net: Host michaelv@localhost.HeadCandy.com [127.0.0.1] didn't use HELO protocol To: sysseh@devetir.qld.gov.au (Stephen Hocking) cc: freebsd-fs@freebsd.org, current@freebsd.org Subject: Re: The VIVA file system (fwd) In-reply-to: Your message of Sun, 25 Aug 96 03:35:19 +0000. <199608250335.DAA21536@netfl15a.devetir.qld.gov.au> Date: Sun, 25 Aug 1996 00:30:20 -0700 From: "Michael L. VanLoon" Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >Anybody have opinions on this vs LFS? Are we still waiting for the Lite-2 >stuff, before LFS can go in? The fact that they claim extraordinary performance for a filesystem that sounds like it's only about a third implemented? ;-) Remember the last 10% and 90% of the time/effort... [...] >indirect blocks. Benchmark results of our implementation of VIVA in the >Linux kernel show that it is much faster than Ext2, the default Linux >filesystem, for common file operations. >The Linux implementation of VIVA is a "work in progress". It does not >yet handle partitions larger than 64M (so that the allocation bitmap >fits readily in memory). Individual files are limited to about 8M >(inodes currently have only a single indirect block). There are no >fragments; block size is restricted to 1K. (Adding logical blocks of >larger size will relieve some of these limitations.) So, what is it good for besides development and benchmarks? :-) I'll be more impressed when I see the finished product benchmarked against something else. ----------------------------------------------------------------------------- Michael L. VanLoon michaelv@MindBender.serv.net --< Free your mind and your machine -- NetBSD free un*x >-- NetBSD working ports: 386+PC, Mac 68k, Amiga, Atari 68k, HP300, Sun3, Sun4/4c/4m, DEC MIPS, DEC Alpha, PC532, VAX, MVME68k, arm32... NetBSD ports in progress: PICA, others... ----------------------------------------------------------------------------- From owner-freebsd-fs Sun Aug 25 13:46:41 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id NAA05834 for fs-outgoing; Sun, 25 Aug 1996 13:46:41 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id NAA05808; Sun, 25 Aug 1996 13:46:33 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id NAA21331; Sun, 25 Aug 1996 13:36:27 -0700 From: Terry Lambert Message-Id: <199608252036.NAA21331@phaeton.artisoft.com> Subject: Re: The VIVA file system (fwd) To: dyson@FreeBSD.org Date: Sun, 25 Aug 1996 13:36:26 -0700 (MST) Cc: sysseh@devetir.qld.gov.au, freebsd-fs@FreeBSD.org, current@FreeBSD.org In-Reply-To: <199608250445.XAA05829@dyson.iquest.net> from "John S. Dyson" at Aug 24, 96 11:45:53 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > > Anybody have opinions on this vs LFS? Are we still waiting for the Lite-2 > > stuff, before LFS can go in? > > Looks interesting, but LFS is also. Some of the improvements will appear > when we get our implementation of properly delayed writes working for > UFS. I am sure that someone will take-on LFS when Lite-2 stuff goes > in, even I might (shiver :-)). The VIVA stuff is, I think, overoptimistic. They have made a number of claims in the University of Kentucky papers that were published about two years ago that seem to rely on overly optimistic assumptions about policy and usage. They also seemed to pick "worst case" scenarios for comparison with FFS, and avoided FFS best case. This is not nearly as bad as the MACH MSDOSFS papers, which intentioanlly handicapped FFS through parameter setting and cache reduction, while caching the entire DOS FAT in core was seen as being acceptable, to compare their work to FFS. But it is certainly not entirely unbiased reporting. Several of their approaches will (if you have read the Herrin/Finkel paper in any depth) apply directly to FFS with little or no modification. The file delete benchmarks are especially telling; they are comparing directory entry manipulation policy, which has little to do with the ability of the file system to store files at all. The dog-legs in the creation result from similar effects. The read and rewrite differences are moslty attributable to policy issues in the use of FFS "optimizations" which are inapropriate to the hardware used. You can see from the block-sized based divergence in the single file case that the dog-legs one would expect when the use of indirect blocks comes into play are missing. This refutes their claim of where the actual wins are coming from. I would be interested to see how VIVA performs in the following: 1) Unified VM/buffer cache implementation 2) Fixed write clustering; the BSD algorith traditionally has had problems which remain unaddressed (except in prototype code from Matt Day, which has not been externally distributed at all) 3) FFS optimizations for head positioning turned off, and/or use of FFS optimizationss on older drive hardware, where the optimizations are not actually pessimizations Finally, I am interested in, but suspicious of, their compression claims, since they also claim that the FFS performance degradation, which Knuth clearly shows to be a hash effect to be expected after an 85% fill (in "Sorting and Searching"), to be nonexistant. INRE: "where the wins come from", the "Discussion" reverses the claims made earlier in the paper -- we see that the avoidance of indirect blocks is not the primary win (a conclusion we came to on our own from viewing the earlier graphs). We also see in "Discussion" that caching file beginnings/ends in the inode itself is not a win as they has hoped. In fact, compilation times are pessimized by 25% by it. The final interesting aspect of VIVA is their crash recovery. The same effect can be had, with more general applicability, by examining the FS in terms of graph theory, and establishing node-relationship handlers for soft updates: an obvious extention of the Ganger/Patt work on soft updates, which disconnects the soft update mechanism from the FS implementation (the one obvious drawback of the Ganger/Patt work is that it is inhernetly tied to FFS, and each FS fro which it is to be impelemented requires it's own individual modifications and implementation of timer service routines). If you are looking for something to do a Master's Thesis on, moving applicable VIVA FS techniques into FFS would be one thing to consider, since it would allow you to determine the origin of the effects that are noted in the paper with better accuracy. If you are looking for a porting project, you could do worse. If you are trying to "keep up with Linux", it's probably not worth the direct effort, at least until, as another poster noted, there is sufficient grounds for a side-by-side comparison with a well studied FS implementation. My opinions, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Sun Aug 25 18:49:14 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id SAA07359 for fs-outgoing; Sun, 25 Aug 1996 18:49:14 -0700 (PDT) Received: from UKCC.uky.edu (ukcc.uky.edu [128.163.1.170]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id SAA07351; Sun, 25 Aug 1996 18:49:12 -0700 (PDT) Received: from t2.mscf.uky.edu by UKCC.uky.edu (IBM VM SMTP V2R3) with TCP; Sun, 25 Aug 96 21:47:03 EDT Received: from t1.mscf.uky.edu by t2.ms.uky.edu id aa12275; 25 Aug 96 21:45 EDT From: eric@ms.uky.edu Subject: Re: The VIVA file system (fwd) To: Terry Lambert Date: Sun, 25 Aug 1996 21:45:16 -0400 (EDT) Cc: freebsd-fs@freebsd.org, current@freebsd.org In-Reply-To: <199608252036.NAA21331@phaeton.artisoft.com> from "Terry Lambert" at Aug 25, 96 01:36:26 pm X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9608252145.aa12275@t2.t2.mscf.uky.edu> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I guess I should respond to this thread since I happen to be on all these nice FreeBSD mailing lists nowadays. The Linux version was done by one of Raphael's Master's students. It isn't complete, but it does apparently work (I personally have not seen it nor have I seen the performance figures). As a side note, I am currently working on Viva2, which should be a much more interesting gadget. Anyway, it is in active development again and FreeBSD is the platform this time. > > > Anybody have opinions on this vs LFS? Are we still waiting for the Lite-2 > > > stuff, before LFS can go in? > > > > Looks interesting, but LFS is also. Some of the improvements will appear > > when we get our implementation of properly delayed writes working for > > UFS. I am sure that someone will take-on LFS when Lite-2 stuff goes > > in, even I might (shiver :-)). > > The VIVA stuff is, I think, overoptimistic. > > They have made a number of claims in the University of Kentucky papers > that were published about two years ago that seem to rely on overly > optimistic assumptions about policy and usage. You might explain this one, I'm not sure I know what you mean. The paper was written over three years ago. The work was actually performed from late 1991-1992. The AT&T lawsuit came out, I became distracted with making a living, and haven't gotten back to it until a couple months ago. For all the discussion below, you must remember that the platforms for Viva were 1) AT&T SysV, and 2) BSDI's BSD/386. We abandoned SysV because I wanted to release the code, then came the AT&T lawsuit:-( > They also seemed to pick "worst case" scenarios for comparison with FFS, > and avoided FFS best case. We did our testing on clean, freshly newfs'd partitions for the graphs. I don't see how this is "worst case", but perhaps you mean the types of tests we ran. Obviously, we ran some tests that showed a difference between FFS and Viva. > This is not nearly as bad as the MACH MSDOSFS papers, which intentioanlly > handicapped FFS through parameter setting and cache reduction, while > caching the entire DOS FAT in core was seen as being acceptable, to > compare their work to FFS. > > But it is certainly not entirely unbiased reporting. I'm not sure how to react to this. Can one write an "entirely unbiased" report about one's own work? Personally, I don't think so. We tried. I'll leave it at that. <--stuff deleted> > The read and rewrite differences are moslty attributable to policy > issues in the use of FFS "optimizations" which are inapropriate to > the hardware used. The read and rewrite differences are due to the fact FFS didn't do clustering very well at all. BSDI *still* doesn't do it well, but FreeBSD appears to be much better at it. I'm still running tests though and probably will be running tests for some time yet. <--more stuff deleted> > Finally, I am interested in, but suspicious of, their compression > claims, since they also claim that the FFS performance degradation, > which Knuth clearly shows to be a hash effect to be expected after > an 85% fill (in "Sorting and Searching"), to be nonexistant. Well, the results are in the paper. This is what we saw, but you should look at the table carefully. There are places where the effective clustering of a particular file degrades over 50%, but that was (at the time) about as good as FFS ever did anyway. The mean effective clustering always remained very high (90%+). I should have some more modern numbers in a few months, Raphael's student probably has some for Linux now. I think he used the original algorithms. > INRE: "where the wins come from", the "Discussion" reverses the > claims made earlier in the paper -- we see that the avoidance of > indirect blocks is not the primary win (a conclusion we came to > on our own from viewing the earlier graphs). This is correct, the big performance wins came from: 1) Large block sizes and small frag sizes 2) Good clustering 3) Multiple read-ahead > We also see in "Discussion" that caching file beginnings/ends in the > inode itself is not a win as they has hoped. In fact, compilation > times are pessimized by 25% by it. Yes, we were disappointed by that, but it just confirmed what others (Tanenbaum for example) had seen earlier. You should remember that one reason for the degradation was that it threw off all the code that tries to read things in FS block-sized chunks. We wanted to be able to read headers in files quickly and it *does* do that well. We just thought it would be nice to provide that capability (some people have entire file systems dedicated to particular tasks). Some space in the inode can be used for lots of things, perhaps it would be most useful at user level and disjoint from the bytes in the file. Eric From owner-freebsd-fs Mon Aug 26 02:20:07 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id CAA29099 for fs-outgoing; Mon, 26 Aug 1996 02:20:07 -0700 (PDT) Received: from eins.siemens.at (eins.siemens.at [193.81.246.11]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id CAA29011; Mon, 26 Aug 1996 02:19:54 -0700 (PDT) Received: from sol1.gud.siemens.co.at (root@firix [10.1.143.100]) by eins.siemens.at (8.7.4/8.7.3) with SMTP id LAA27206; Mon, 26 Aug 1996 11:17:57 +0200 (MET DST) Received: from ws2301.gud.siemens.co.at by sol1.gud.siemens.co.at with smtp (Smail3.1.28.1 #7 for ) id m0uuxnu-00021iC; Mon, 26 Aug 96 11:17 MET DST Received: by ws2301.gud.siemens.co.at (1.37.109.16/1.37) id AA242520847; Mon, 26 Aug 1996 11:14:07 +0200 From: "Hr.Ladavac" Message-Id: <199608260914.AA242520847@ws2301.gud.siemens.co.at> Subject: Re: The VIVA file system (fwd) To: sysseh@devetir.qld.gov.au (Stephen Hocking) Date: Mon, 26 Aug 1996 11:14:07 +0200 (MESZ) Cc: freebsd-fs@freebsd.org, current@freebsd.org In-Reply-To: <199608250335.DAA21536@netfl15a.devetir.qld.gov.au> from "Stephen Hocking" at Aug 25, 96 03:35:19 am X-Mailer: ELM [version 2.4 PL24 ME8a] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk E-mail message from Stephen Hocking contained: Anybody have opinions on this vs LFS? Are we still waiting for the Lite-2 stuff, before LFS can go in? > The package contains a paper on VIVA by its developers, Eric H. Herrin > II > and Raphael A. Finkel, and a report on implementing VIVA in Linux by > Shankar Pasupathy. A brief description of VIVA follows. > > The VIVA filesystem was designed to minimize the time taken for file > operations. VIVA achieves this goal by using an allocation policy that > clusters sequentially accessed disk blocks so that disk-head movement > is minimized. VIVA also uses this clustering to compress block addresses > in an inode from 32 bits to 1 bit, relative to traditional filesystems. > This compression allows us to access about 800KB of data without using > indirect blocks. Benchmark results of our implementation of VIVA in the > Linux kernel show that it is much faster than Ext2, the default Linux > filesystem, for common file operations. > > The Linux implementation of VIVA is a "work in progress". It does not > yet handle partitions larger than 64M (so that the allocation bitmap > fits readily in memory). Individual files are limited to about 8M > (inodes currently have only a single indirect block). There are no > fragments; block size is restricted to 1K. (Adding logical blocks of > larger size will relieve some of these limitations.) Sounds sort of like clustering FFS if I'm reading this correctly. Okay, with a lessened need for indirect blocks, but with a typical unix file length distribution, I don't think they gain overmuch. /Marino > > Shankar Pasupathy > (shankar@pop.uky.edu) > > > From owner-freebsd-fs Mon Aug 26 02:59:05 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id CAA00948 for fs-outgoing; Mon, 26 Aug 1996 02:59:05 -0700 (PDT) Received: (from hsu@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id CAA00924; Mon, 26 Aug 1996 02:58:58 -0700 (PDT) Date: Mon, 26 Aug 1996 02:58:58 -0700 (PDT) From: Jeffrey Hsu Message-Id: <199608260958.CAA00924@freefall.freebsd.org> To: lada@ws2301.gud.siemens.co.at Subject: Re: The VIVA file system (fwd) Cc: current, freebsd-fs Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > Are we still waiting for the Lite-2 stuff, before LFS can go in? We're waiting for someone to integrate the latest LFS from Keith and Margo with our VM. Compared to that, the the Lite2 VOP changes are minor. There is a new version of LFS in Lite2, but since there's an even newer version available from the author, LFS was left out of the Lite2 integration. From owner-freebsd-fs Mon Aug 26 15:06:10 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id PAA19686 for fs-outgoing; Mon, 26 Aug 1996 15:06:10 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id PAA19658; Mon, 26 Aug 1996 15:05:53 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA23328; Mon, 26 Aug 1996 14:55:12 -0700 From: Terry Lambert Message-Id: <199608262155.OAA23328@phaeton.artisoft.com> Subject: Re: The VIVA file system (fwd) To: eric@ms.uky.edu Date: Mon, 26 Aug 1996 14:55:12 -0700 (MST) Cc: terry@lambert.org, freebsd-fs@freebsd.org, current@freebsd.org In-Reply-To: <9608252145.aa12275@t2.t2.mscf.uky.edu> from "eric@ms.uky.edu" at Aug 25, 96 09:45:16 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > I guess I should respond to this thread since I happen to be on > all these nice FreeBSD mailing lists nowadays. > > The Linux version was done by one of Raphael's Master's students. > It isn't complete, but it does apparently work (I personally have > not seen it nor have I seen the performance figures). > > As a side note, I am currently working on Viva2, which should be a much > more interesting gadget. Anyway, it is in active development again > and FreeBSD is the platform this time. > > > The VIVA stuff is, I think, overoptimistic. > > > > They have made a number of claims in the University of Kentucky papers > > that were published about two years ago that seem to rely on overly > > optimistic assumptions about policy and usage. > > You might explain this one, I'm not sure I know what you mean. The > paper was written over three years ago. The work was actually > performed from late 1991-1992. The AT&T lawsuit came out, I became > distracted with making a living, and haven't gotten back to it until > a couple months ago. The optimism is in terms of comparative technology. The FFS you talk about in the paper is *not* the FFS FreeBSD is running. Sorry if this seemed like an attack on VIVA; it was intended as an attack on the idea of replacing FFS with VIVA based on the contents of the paper, and the fact that "Linux has it, now we need it!". I know that I saw the paper at least two years and 5 months ago, if not before that -- I *think* I saw it the week it came out; there was a presentation by one of the grad students involved to the USL FS gurus: Art Sabsevitch, Wen Ling Lu, etc., of the code on SVR4. > For all the discussion below, you must remember that the platforms for > Viva were 1) AT&T SysV, and 2) BSDI's BSD/386. We abandoned SysV > because I wanted to release the code, then came the AT&T lawsuit:-( I saw the code on #1. That's part of what made me skeptical; the SVR4 FFS implementation was intentionally (IMO) crippled on a lot of defaults and tunables so they could make the claims they did about VXFS. The VXFS code was the shining golden baby. Never mind that it was itself FFS derived (for example, it used SVR4 UFS directory management code without modification). Any comparison against SVR4 UFS as it was will be incredibly biased, even if the bias was not an intentional result of the testing conditions, because the UFS code came pre-biased. 8-(. > > They also seemed to pick "worst case" scenarios for comparison with FFS, > > and avoided FFS best case. > > We did our testing on clean, freshly newfs'd partitions for the graphs. > I don't see how this is "worst case", but perhaps you mean the types > of tests we ran. Obviously, we ran some tests that showed a difference > between FFS and Viva. Well, of course. And without looking at it side-by-side myself, I really couldn't say if there were only positive-for-VIVA comparisons existing, or only positive one presented. The near-full-disk scenario seemed a bit contrived, but *could* be justified in real world usage. Like I said, I'd like to see someone working on a thesis (or with a similar incentive for detail) revisit the whole thing in light of the rewritten FICUS-derived-VFS-based FFS code in FreeBSD, post cache-unification. I wonder if all the locality wins would still hold. My opinion is that at least three of them would not; I'd be happy to be proven wrong, and find out that they are additive to the VM/buffer cache architecture improvements in FreeBSD's VFS/VM interface. > > This is not nearly as bad as the MACH MSDOSFS papers, which intentioanlly > > handicapped FFS through parameter setting and cache reduction, while > > caching the entire DOS FAT in core was seen as being acceptable, to > > compare their work to FFS. > > > > But it is certainly not entirely unbiased reporting. > > I'm not sure how to react to this. Can one write an "entirely > unbiased" report about one's own work? Personally, I don't think > so. We tried. I'll leave it at that. I don't think it's possible, either -- conclusions must always be taken with a grain of salt. It is definitely *not* in the same class as the MACH paper, which exhibited intentional bias. I guess this would have read better if you knew about the MACH paper's taking of license before you read my comment. Sorry about that -- I tend to assume everyone has the same context, or is willing to get it. I wasn't going off half-cocked. > > The read and rewrite differences are moslty attributable to policy > > issues in the use of FFS "optimizations" which are inapropriate to > > the hardware used. > > The read and rewrite differences are due to the fact FFS didn't do > clustering very well at all. BSDI *still* doesn't do it well, but > FreeBSD appears to be much better at it. I'm still running tests > though and probably will be running tests for some time yet. Yes. Like I said, Matt Day's work on this is relevent, but it may never see the light. He's made some comments to the effect of what he has done in brief discussions on the -current list. Even with the recent work, there's a lot of room for improvement in FreeBSD clustering and write-gathering, without needing a new disk layout to get it. As before, I'm not sure if this is additive or parallel to the VIVA developement. > > Finally, I am interested in, but suspicious of, their compression > > claims, since they also claim that the FFS performance degradation, > > which Knuth clearly shows to be a hash effect to be expected after > > an 85% fill (in "Sorting and Searching"), to be nonexistant. > > Well, the results are in the paper. This is what we saw, but > you should look at the table carefully. There are places where > the effective clustering of a particular file degrades over 50%, > but that was (at the time) about as good as FFS ever did anyway. > The mean effective clustering always remained very high (90%+). Yes; I didn't make the distinction on "effective clustering", the term which was introduced in the paper. I'm really not sure about one cache effect being superior to another in a unified VM. The FFS does do clustering as well, and it seems that this has more to do with file I/O clustering mapping to disk I/O clustering effectively than anything that might be a real artifact of the FFS layout itself. The address space compression is interesting for vnode-based buffering; FreeBSD currently uses this method, but... well, I've railed often enough against vclean that I think everyone knows how I feel. > I should have some more modern numbers in a few months, Raphael's > student probably has some for Linux now. I think he used the > original algorithms. Yes... really, it wants a new paper (or 3 or 5 of them). There is a lot of room for exciting work in FS development. I just think that it would be ill-advised to expect the same gains in a FreeBSD framework, at least until some basic academic work has taken place that addresses the new situation. > > INRE: "where the wins come from", the "Discussion" reverses the > > claims made earlier in the paper -- we see that the avoidance of > > indirect blocks is not the primary win (a conclusion we came to > > on our own from viewing the earlier graphs). > > This is correct, the big performance wins came from: > > 1) Large block sizes and small frag sizes > 2) Good clustering > 3) Multiple read-ahead Yes; the only one failing in FreeBSD right now is #3. There is some room for improvement in #2 for writes, but that won't affect the read/rewrite speeds (or shouldn't). > > We also see in "Discussion" that caching file beginnings/ends in the > > inode itself is not a win as they has hoped. In fact, compilation > > times are pessimized by 25% by it. > > Yes, we were disappointed by that, but it just confirmed what others > (Tanenbaum for example) had seen earlier. You should remember > that one reason for the degradation was that it threw off all > the code that tries to read things in FS block-sized chunks. > We wanted to be able to read headers in files quickly and it *does* > do that well. We just thought it would be nice to provide that > capability (some people have entire file systems dedicated to > particular tasks). Some space in the inode can be used for lots > of things, perhaps it would be most useful at user level and disjoint > from the bytes in the file. I was considering this at one time for executable header information, to cause the pages in the actual binary on disk to be block aligned for more efficient paging at the boundry conditions. This is probably still a win, but might be better handled by moving to a more generic attribution mechanism. I should probably say that I've had experience both with upping the directory block size (for Unicode and multiple name space support), and with doubling the inode size (for use in attribution -- NetWare, Apple, OS/2, and NT file attributes, specifically). I saw similar NULL effects everywhere but for competition with Veritas, which must fault seperate pages for their file attribution, and so pay a serious performance penalty for use of attributes, in comparison. So for that case, it maintained the status quo instead of being a loss. I also found it useful to combine the directory lookup and stat operations into a single system call, saving two protection domain crossings, and to pre-fault the inode asynchronously when the directory entry was referenced. This last assumed locality of usage, and was probably more related to the application (attributed kernel FS for NetWare services in a UNIX environment) than to a general win. Like the file rewrite benchmark in the VIVA paper, this was a specialized usage, and more related to implementation than architecture... for file rewrites on a block boundry granularity, it is not strictly necessary to fault in the blocks to be rewritten from the disk, using the historical read-before-write, and then comparing the (obviously bad) numbers that result. To my knowledge, partial page sequential writes on block or multiple-of-block granularity are possible, but not currently supported by the FreeBSD VM. If supported, I expect that the win from doing the rewrite that way will dwarf any small relative win into insignificance. In any case, the VIVA code is worth pursuing; just not for the reasons that seemed to be behind the original posting ("Linux has it, we must have it"). If someone wants to pursue it mining for a thesis, it's far better than some of the other areas people tend to look to. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Mon Aug 26 16:03:14 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA22641 for fs-outgoing; Mon, 26 Aug 1996 16:03:14 -0700 (PDT) Received: from UKCC.uky.edu (ukcc.uky.edu [128.163.1.170]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id QAA22634; Mon, 26 Aug 1996 16:03:11 -0700 (PDT) Received: from t2.mscf.uky.edu by UKCC.uky.edu (IBM VM SMTP V2R3) with TCP; Mon, 26 Aug 96 19:01:03 EDT Received: from t1.mscf.uky.edu by t2.ms.uky.edu id aa24476; 26 Aug 96 18:58 EDT From: eric@ms.uky.edu Subject: Re: The VIVA file system (fwd) To: Terry Lambert Date: Mon, 26 Aug 1996 18:58:09 -0400 (EDT) Cc: freebsd-fs@freebsd.org, current@freebsd.org In-Reply-To: <199608262155.OAA23328@phaeton.artisoft.com> from "Terry Lambert" at Aug 26, 96 02:55:12 pm X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9608261858.aa24476@t2.t2.mscf.uky.edu> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > I know that I saw the paper at least two years and 5 months ago, if not > before that -- I *think* I saw it the week it came out; there was a > presentation by one of the grad students involved to the USL FS gurus: > Art Sabsevitch, Wen Ling Lu, etc., of the code on SVR4. > I was the sole implementor of all versions of Viva. No other grad students were involved at the time... > > > For all the discussion below, you must remember that the platforms for > > Viva were 1) AT&T SysV, and 2) BSDI's BSD/386. We abandoned SysV > > because I wanted to release the code, then came the AT&T lawsuit:-( > > I saw the code on #1. That's part of what made me skeptical; the > SVR4 FFS implementation was intentionally (IMO) crippled on a lot > of defaults and tunables so they could make the claims they did > about VXFS. The VXFS code was the shining golden baby. Never mind > that it was itself FFS derived (for example, it used SVR4 UFS directory > management code without modification). Any comparison against SVR4 > UFS as it was will be incredibly biased, even if the bias was not > an intentional result of the testing conditions, because the UFS > code came pre-biased. 8-(. Are you talking about VIFS or VXFS? I seem to remember that VXFS was the Veritas File System. Veritas had nothing to do with Viva. Perhaps you are confusing the two. Eric From owner-freebsd-fs Mon Aug 26 16:16:08 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA23165 for fs-outgoing; Mon, 26 Aug 1996 16:16:08 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id QAA23147; Mon, 26 Aug 1996 16:16:05 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id QAA23479; Mon, 26 Aug 1996 16:05:29 -0700 From: Terry Lambert Message-Id: <199608262305.QAA23479@phaeton.artisoft.com> Subject: Re: The VIVA file system (fwd) To: eric@ms.uky.edu Date: Mon, 26 Aug 1996 16:05:29 -0700 (MST) Cc: terry@lambert.org, freebsd-fs@freebsd.org, current@freebsd.org In-Reply-To: <9608261858.aa24476@t2.t2.mscf.uky.edu> from "eric@ms.uky.edu" at Aug 26, 96 06:58:09 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > I know that I saw the paper at least two years and 5 months ago, if not > > before that -- I *think* I saw it the week it came out; there was a > > presentation by one of the grad students involved to the USL FS gurus: > > Art Sabsevitch, Wen Ling Lu, etc., of the code on SVR4. > > I was the sole implementor of all versions of Viva. No other grad > students were involved at the time... Did you do the presentation? I'm sure it was a U of KY grad student who was interning at USL. > Are you talking about VIFS or VXFS? I seem to remember that > VXFS was the Veritas File System. Veritas had nothing to do > with Viva. Perhaps you are confusing the two. VIVA in general, VXFS in the specific instance of why an SVR4 UFS comparison isn't really a strong comparison. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Mon Aug 26 16:33:19 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA23620 for fs-outgoing; Mon, 26 Aug 1996 16:33:19 -0700 (PDT) Received: from UKCC.uky.edu (ukcc.uky.edu [128.163.1.170]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id QAA23615; Mon, 26 Aug 1996 16:33:16 -0700 (PDT) Received: from t2.mscf.uky.edu by UKCC.uky.edu (IBM VM SMTP V2R3) with TCP; Mon, 26 Aug 96 19:31:08 EDT Received: from t1.mscf.uky.edu by t2.ms.uky.edu id aa27777; 26 Aug 96 19:29 EDT From: eric@ms.uky.edu Subject: Re: The VIVA file system (fwd) To: Terry Lambert Date: Mon, 26 Aug 1996 19:29:06 -0400 (EDT) Cc: freebsd-fs@freebsd.org, current@freebsd.org In-Reply-To: <199608262305.QAA23479@phaeton.artisoft.com> from "Terry Lambert" at Aug 26, 96 04:05:29 pm X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9608261929.aa27777@t2.t2.mscf.uky.edu> Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > > > I know that I saw the paper at least two years and 5 months ago, if not > > > before that -- I *think* I saw it the week it came out; there was a > > > presentation by one of the grad students involved to the USL FS gurus: > > > Art Sabsevitch, Wen Ling Lu, etc., of the code on SVR4. > > > > I was the sole implementor of all versions of Viva. No other grad > > students were involved at the time... > > Did you do the presentation? I'm sure it was a U of KY grad student > who was interning at USL. Nope. It is possible that some grad student did a presentation without a demo, but they weren't involved with the development or implementation of Viva. > > Are you talking about VIFS or VXFS? I seem to remember that > > VXFS was the Veritas File System. Veritas had nothing to do > > with Viva. Perhaps you are confusing the two. > > VIVA in general, VXFS in the specific instance of why an SVR4 UFS > comparison isn't really a strong comparison. Ok, but we never released *any* results from our UFS implementation. I determined that 1) I couldn't release the code, and 2) UFS on SysV was crippled and didn't make a good comparison. We dropped it totally at that point and I picked up a source copy of BSDI. Eric From owner-freebsd-fs Mon Aug 26 17:38:46 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA27637 for fs-outgoing; Mon, 26 Aug 1996 17:38:46 -0700 (PDT) Received: from bunyip.cc.uq.oz.au (daemon@bunyip.cc.uq.oz.au [130.102.2.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA27616; Mon, 26 Aug 1996 17:38:38 -0700 (PDT) Received: (from daemon@localhost) by bunyip.cc.uq.oz.au (8.7.5/8.7.3) id KAA06970; Tue, 27 Aug 1996 10:38:28 +1000 Received: from netfl15a.devetir.qld.gov.au by pandora.devetir.qld.gov.au (8.6.10/DEVETIR-E0.3a) with ESMTP id KAA16808; Tue, 27 Aug 1996 10:24:04 +1000 Received: from localhost by netfl15a.devetir.qld.gov.au (8.6.8.1/DEVETIR-0.1) id AAA00161; Tue, 27 Aug 1996 00:24:58 GMT Message-Id: <199608270024.AAA00161@netfl15a.devetir.qld.gov.au> X-Mailer: exmh version 1.6.5 12/11/95 To: Jeffrey Hsu cc: current@freefall.freebsd.org, freebsd-fs@freefall.freebsd.org Subject: Re: The VIVA file system (fwd) In-reply-to: Your message of "Mon, 26 Aug 1996 02:58:58 MST." <199608260958.CAA00924@freefall.freebsd.org> X-Face: 3}heU+2?b->-GSF-G4T4>jEB9~FR(V9lo&o>kAy=Pj&;oVOc<|pr%I/VSG"ZD32J>5gGC0N 7gj]^GI@M:LlqNd]|(2OxOxy@$6@/!,";-!OlucF^=jq8s57$%qXd/ieC8DhWmIy@J1AcnvSGV\|*! >Bvu7+0h4zCY^]{AxXKsDTlgA2m]fX$W@'8ev-Qi+-;%L'CcZ'NBL!@n?}q!M&Em3*eW7,093nOeV8 M)(u+6D;%B7j\XA/9j4!Gj~&jYzflG[#)E9sI&Xe9~y~Gn%fA7>F:YKr"Wx4cZU*6{^2ocZ!YyR Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 27 Aug 1996 10:24:55 +1000 From: Stephen Hocking Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > Are we still waiting for the Lite-2 stuff, before LFS can go in? > > We're waiting for someone to integrate the latest LFS from Keith > and Margo with our VM. Compared to that, the the Lite2 VOP changes > are minor. There is a new version of LFS in Lite2, but since > there's an even newer version available from the author, LFS was > left out of the Lite2 integration. (Gulp) Who's the someone? Stephen -- The views expressed above are not those of the Worker's Compensation Board of Queensland, Australia. From owner-freebsd-fs Mon Aug 26 17:40:44 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA27845 for fs-outgoing; Mon, 26 Aug 1996 17:40:44 -0700 (PDT) Received: from bunyip.cc.uq.oz.au (daemon@bunyip.cc.uq.oz.au [130.102.2.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA27834 for ; Mon, 26 Aug 1996 17:40:40 -0700 (PDT) Received: (from daemon@localhost) by bunyip.cc.uq.oz.au (8.7.5/8.7.3) id KAA07268; Tue, 27 Aug 1996 10:40:23 +1000 Received: from netfl15a.devetir.qld.gov.au by pandora.devetir.qld.gov.au (8.6.10/DEVETIR-E0.3a) with ESMTP id KAA17125; Tue, 27 Aug 1996 10:41:50 +1000 Received: from localhost by netfl15a.devetir.qld.gov.au (8.6.8.1/DEVETIR-0.1) id AAA00727; Tue, 27 Aug 1996 00:42:46 GMT Message-Id: <199608270042.AAA00727@netfl15a.devetir.qld.gov.au> X-Mailer: exmh version 1.6.5 12/11/95 To: Terry Lambert cc: freebsd-fs@freebsd.org, eric@ms.uky.edu Subject: Re: The VIVA file system (fwd) In-reply-to: Your message of "Mon, 26 Aug 1996 14:55:12 MST." <199608262155.OAA23328@phaeton.artisoft.com> X-Face: 3}heU+2?b->-GSF-G4T4>jEB9~FR(V9lo&o>kAy=Pj&;oVOc<|pr%I/VSG"ZD32J>5gGC0N 7gj]^GI@M:LlqNd]|(2OxOxy@$6@/!,";-!OlucF^=jq8s57$%qXd/ieC8DhWmIy@J1AcnvSGV\|*! >Bvu7+0h4zCY^]{AxXKsDTlgA2m]fX$W@'8ev-Qi+-;%L'CcZ'NBL!@n?}q!M&Em3*eW7,093nOeV8 M)(u+6D;%B7j\XA/9j4!Gj~&jYzflG[#)E9sI&Xe9~y~Gn%fA7>F:YKr"Wx4cZU*6{^2ocZ!YyR Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 27 Aug 1996 10:42:44 +1000 From: Stephen Hocking Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk [......] > > In any case, the VIVA code is worth pursuing; just not for the reasons > that seemed to be behind the original posting ("Linux has it, we must > have it"). If someone wants to pursue it mining for a thesis, it's > far better than some of the other areas people tend to look to. > I must admit that it looked rather similar to the LFS stuff, so the forwarding of the message was meant to stimulate some discussion of the merits of both, rather than as urging for people to play "catch up". I'm getting a little impatient for LFS and was wondering what was happening. Stephen -- The views expressed above are not those of the Worker's Compensation Board of Queensland, Australia. From owner-freebsd-fs Mon Aug 26 17:42:12 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA27983 for fs-outgoing; Mon, 26 Aug 1996 17:42:12 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA27966; Mon, 26 Aug 1996 17:42:07 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id AAA16894; Tue, 27 Aug 1996 00:41:37 GMT Date: Tue, 27 Aug 1996 09:41:37 +0900 (JST) From: Michael Hancock Reply-To: Michael Hancock To: Terry Lambert cc: eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG Subject: vclean (was The VIVA file system) In-Reply-To: <199608262155.OAA23328@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Mon, 26 Aug 1996, Terry Lambert wrote: > The address space compression is interesting for vnode-based buffering; > FreeBSD currently uses this method, but... well, I've railed often > enough against vclean that I think everyone knows how I feel. > Some subsystems now depend on the ability to disassociate buffers from vnodes. For example, kill the session leader and the session terminal is revoked from all children and the deadfs is associated with the vnode. The only call that doesn't return an error is close() and the children eventually exit. I think what needs to be looked at is having more synchronized buffer cache/vnode recycling policies. Regards, Mike Hancock From owner-freebsd-fs Mon Aug 26 19:04:32 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA07230 for fs-outgoing; Mon, 26 Aug 1996 19:04:32 -0700 (PDT) Received: (from hsu@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA07207; Mon, 26 Aug 1996 19:04:27 -0700 (PDT) Date: Mon, 26 Aug 1996 19:04:27 -0700 (PDT) From: Jeffrey Hsu Message-Id: <199608270204.TAA07207@freefall.freebsd.org> To: sysseh@devetir.qld.gov.au Subject: Re: The VIVA file system (fwd) Cc: current@freefall.freebsd.org, freebsd-fs@freefall.freebsd.org Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk >>> Are we still waiting for the Lite-2 stuff, before LFS can go in? >> >> We're waiting for someone to integrate the latest LFS from Keith >> and Margo with our VM. > >(Gulp) Who's the someone? > > Stephen Ask not for whom the bell tolls. From owner-freebsd-fs Mon Aug 26 19:31:14 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA16798 for fs-outgoing; Mon, 26 Aug 1996 19:31:14 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id TAA16702; Mon, 26 Aug 1996 19:31:06 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id TAA23773; Mon, 26 Aug 1996 19:20:17 -0700 From: Terry Lambert Message-Id: <199608270220.TAA23773@phaeton.artisoft.com> Subject: Re: vclean (was The VIVA file system) To: michaelh@cet.co.jp Date: Mon, 26 Aug 1996 19:20:17 -0700 (MST) Cc: terry@lambert.org, eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: from "Michael Hancock" at Aug 27, 96 09:41:37 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > The address space compression is interesting for vnode-based buffering; > > FreeBSD currently uses this method, but... well, I've railed often > > enough against vclean that I think everyone knows how I feel. > > Some subsystems now depend on the ability to disassociate buffers from > vnodes. For example, kill the session leader and the session terminal is > revoked from all children and the deadfs is associated with the vnode. > The only call that doesn't return an error is close() and the children > eventually exit. > > I think what needs to be looked at is having more synchronized buffer > cache/vnode recycling policies. Inode data, disklabel data, and any other FS object which is not file contents is not cached under the current policy. Further, dissociating buffers from vnodes does not require that they be returned to a global pool for clean-behind. I think the non-opacity of vnodes is a mistake. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Mon Aug 26 19:38:06 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA21441 for fs-outgoing; Mon, 26 Aug 1996 19:38:06 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id TAA21409 for ; Mon, 26 Aug 1996 19:38:03 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id TAA23789; Mon, 26 Aug 1996 19:27:11 -0700 From: Terry Lambert Message-Id: <199608270227.TAA23789@phaeton.artisoft.com> Subject: Re: The VIVA file system (fwd) To: sysseh@devetir.qld.gov.au (Stephen Hocking) Date: Mon, 26 Aug 1996 19:27:10 -0700 (MST) Cc: terry@lambert.org, freebsd-fs@freebsd.org, eric@ms.uky.edu In-Reply-To: <199608270042.AAA00727@netfl15a.devetir.qld.gov.au> from "Stephen Hocking" at Aug 27, 96 10:42:44 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > I must admit that it looked rather similar to the LFS stuff, so the > forwarding of the message was meant to stimulate some discussion of > the merits of both, rather than as urging for people to play "catch up". > I'm getting a little impatient for LFS and was wondering what was > happening. I'm not the one to ask about LFS; it's interesting, but not enough for me to want to hack on it. I think the Lite2 did not integrate it over the vnode locking changes, so it's more than a little work, since it needs that, plus it needs the VM integration (our VM access should be macrotized for a generic VM for porting anyway). It's all quite nasty in there. I'm also led to believe that Keith Bostic is hacking on the Ganger/Patt soft updates code to integrate it into FFS right now. This means that it will only get worse before it gets better; the Ganger/Patt code is FS specific, when implemented per their sample implementation in the online Appendix A of their paper. It does not consider the FS as a directed graph with commutation and association operators to be triggered to resolve node dependencies. This means it is hard to make a generic "soft updates" implementation that can act as the device interface for all file systems across the board. Doing that work is probably not a short term project, but it could be tackled on an FS-by-FS basis if you were clever about it. Still, it's a lot of work. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Mon Aug 26 20:29:13 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id UAA03661 for fs-outgoing; Mon, 26 Aug 1996 20:29:13 -0700 (PDT) Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id UAA03654 for ; Mon, 26 Aug 1996 20:29:10 -0700 (PDT) Received: (from root@localhost) by dyson.iquest.net (8.7.5/8.6.9) id WAA01133; Mon, 26 Aug 1996 22:26:30 -0500 (EST) From: "John S. Dyson" Message-Id: <199608270326.WAA01133@dyson.iquest.net> Subject: Re: The VIVA file system (fwd) To: terry@lambert.org (Terry Lambert) Date: Mon, 26 Aug 1996 22:26:30 -0500 (EST) Cc: sysseh@devetir.qld.gov.au, terry@lambert.org, freebsd-fs@freebsd.org, eric@ms.uky.edu In-Reply-To: <199608270227.TAA23789@phaeton.artisoft.com> from "Terry Lambert" at Aug 26, 96 07:27:10 pm Reply-To: dyson@freebsd.org X-Mailer: ELM [version 2.4 PL24 ME8] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > I must admit that it looked rather similar to the LFS stuff, so the > > forwarding of the message was meant to stimulate some discussion of > > the merits of both, rather than as urging for people to play "catch up". > > I'm getting a little impatient for LFS and was wondering what was > > happening. > > I'm not the one to ask about LFS; it's interesting, but not enough > for me to want to hack on it. I think the Lite2 did not integrate > it over the vnode locking changes, so it's more than a little work, > since it needs that, plus it needs the VM integration (our VM access > should be macrotized for a generic VM for porting anyway). It's all > quite nasty in there. > I am tonight porting LFS, maybe with some luck will get it to work. However, the FreeBSD VM/Buffer cache stuff affects the filesystem only slightly esp. now. The biggest issue in porting from the old Lite stuff is getting rid of gratuitious vnode_pager_uncaches, and changing the ordering of some operations slightly. John From owner-freebsd-fs Mon Aug 26 20:35:45 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id UAA03980 for fs-outgoing; Mon, 26 Aug 1996 20:35:45 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id UAA03935; Mon, 26 Aug 1996 20:35:35 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id DAA18200; Tue, 27 Aug 1996 03:35:15 GMT Date: Tue, 27 Aug 1996 12:35:14 +0900 (JST) From: Michael Hancock Reply-To: Michael Hancock To: Terry Lambert cc: eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: vclean (was The VIVA file system) In-Reply-To: <199608270220.TAA23773@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Mon, 26 Aug 1996, Terry Lambert wrote: > > I think what needs to be looked at is having more synchronized buffer > > cache/vnode recycling policies. > > Inode data, disklabel data, and any other FS object which is not file > contents is not cached under the current policy. The vnode/inode association with a vnhash() you mentioned before makes sense. I wonder how hard it would be to manage the buffer cache/vnodes/inodes with more synergy (sorry I couldn't think of a better word). > Further, dissociating buffers from vnodes does not require that they > be returned to a global pool for clean-behind. There's an in-place free list that can have valid buffers hanging off of them and vnodes go on the list when inactive() is called. I guess the freelist should be called the inactive list. getnewvnode() vgone() vclean() should only be called when it needs to, such as when file activity moves to a different fs and there aren't enough vnodes. The vnode pool was a fixed size pool in lite, but someone put in a malloc() into getnewvnode(). The vnode pool is kind of wired so I think it can now grow, but it can't shrink unless there's some free()s being done somewhere where I haven't noticed. > I think the non-opacity of vnodes is a mistake. I guess they didn't have time to get this aspect right. Some of the semantics are very interesting though, they look very different from the SysV vnodes I read about in the Vahalia book. Regards, Mike Hancock From owner-freebsd-fs Mon Aug 26 23:29:33 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id XAA15806 for fs-outgoing; Mon, 26 Aug 1996 23:29:33 -0700 (PDT) Received: from obelix.cica.es (obelix.cica.es [150.214.1.10]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA15767; Mon, 26 Aug 1996 23:29:23 -0700 (PDT) Received: (from amora@localhost) by obelix.cica.es (8.7.5/8.7.3) id IAA08884; Tue, 27 Aug 1996 08:26:55 +0200 (GMT-2:00) From: "Jesus A. Mora Marin" Message-Id: <199608270626.IAA08884@obelix.cica.es> Subject: s5 filesys implementation? To: fs@freebsd.org Date: Tue, 27 Aug 1996 08:26:55 +0200 (MET) Cc: questions@freebsd.org, hackers@freebsd.org X-Mailer: ELM [version 2.4 PL25] Content-Type: text Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Greetings! I am planning to implement support for s5 filesys, since I have a particular interest in this subject. But I'd hate to do an unnecessary work, so PLEASE let me know if any fellow is working already on this stuff, or if this has been done, in fact. I am very out of date: the last version I have is 2.2-960326SNAP -and waiting eagerly for the 2.1.5 CD-ROM!-, and there was no support for this oldie. TIA. Jesus A. Mora amora@obelix.cica.es PS: I am not subscribed to the hackers mail list (too smart for me :) Please cc to my e-mail address. From owner-freebsd-fs Tue Aug 27 08:47:36 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id IAA10787 for fs-outgoing; Tue, 27 Aug 1996 08:47:36 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id IAA10777; Tue, 27 Aug 1996 08:47:21 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id IAA24689; Tue, 27 Aug 1996 08:33:53 -0700 From: Terry Lambert Message-Id: <199608271533.IAA24689@phaeton.artisoft.com> Subject: Re: The VIVA file system (fwd) To: dyson@freebsd.org Date: Tue, 27 Aug 1996 08:33:53 -0700 (MST) Cc: terry@lambert.org, sysseh@devetir.qld.gov.au, freebsd-fs@freebsd.org, eric@ms.uky.edu In-Reply-To: <199608270326.WAA01133@dyson.iquest.net> from "John S. Dyson" at Aug 26, 96 10:26:30 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > I am tonight porting LFS, maybe with some luck will get it to work. However, > the FreeBSD VM/Buffer cache stuff affects the filesystem only slightly esp. > now. The biggest issue in porting from the old Lite stuff is getting > rid of gratuitious vnode_pager_uncaches, and changing the ordering of > some operations slightly. Any chance you can replace them with a null macro, instead of just diking them out, in order to make the code less VM implementation dependent? Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Tue Aug 27 08:59:34 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id IAA11438 for fs-outgoing; Tue, 27 Aug 1996 08:59:34 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id IAA11358; Tue, 27 Aug 1996 08:56:34 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id IAA24710; Tue, 27 Aug 1996 08:45:33 -0700 From: Terry Lambert Message-Id: <199608271545.IAA24710@phaeton.artisoft.com> Subject: Re: vclean (was The VIVA file system) To: michaelh@cet.co.jp Date: Tue, 27 Aug 1996 08:45:33 -0700 (MST) Cc: terry@lambert.org, eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: from "Michael Hancock" at Aug 27, 96 12:35:14 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > > I think what needs to be looked at is having more synchronized buffer > > > cache/vnode recycling policies. > > > > Inode data, disklabel data, and any other FS object which is not file > > contents is not cached under the current policy. > > The vnode/inode association with a vnhash() you mentioned before makes > sense. I wonder how hard it would be to manage the buffer > cache/vnodes/inodes with more synergy (sorry I couldn't think of a better > word). You're forgiven... you just need to get with the new "paradigm". 8-) 8-). > > Further, dissociating buffers from vnodes does not require that they > > be returned to a global pool for clean-behind. > > There's an in-place free list that can have valid buffers hanging off of > them and vnodes go on the list when inactive() is called. I guess the > freelist should be called the inactive list. > > getnewvnode() > vgone() > vclean() should only be called when it needs to, such as when > file activity moves to a different fs and there aren't enough vnodes. > > The vnode pool was a fixed size pool in lite, but someone put in a > malloc() into getnewvnode(). The vnode pool is kind of wired so I think > it can now grow, but it can't shrink unless there's some free()s being > done somewhere where I haven't noticed. Yes. The number one problem is that the in-place freelist with valid buffers hanging off of them is not recoverable once the inode data has been disassociated. The vnode is effectively unrecoverable without the buffers being freed. This is very annoying; at the very least, going the other direction, where the vnode is treated as an opaque handle external to the VFS itself instead of as a common "subsystem" in kern/, would allow the buffers to be recovered via the ihash. This is, in fact, the wrong thing to do, since it would require the implementation of an ihash per FS. Better to consider name cache references in the directory lookup cache as if they were vnode references, and push the vnodes into a per FS pool, LRU'ing the shared references out of the name cache. This would result in recoverability for the buffers, at the same time removing the annoying race conditions that sow up every time the VM system is tweaked as "free vnode isn't" or other such crap. The vnode management as it stands is much too fragile. > > I think the non-opacity of vnodes is a mistake. > > I guess they didn't have time to get this aspect right. > > Some of the semantics are very interesting though, they look very > different from the SysV vnodes I read about in the Vahalia book. Yes; there is no reason to lose that going to a per FS vrelease; the most interesting semantic is stacking. I don't think that right now there is a guard against unmounting with a reference active if the data reference to the FS from the vnode is not asserted. This seems to be a problem for a couple of FS's that operate on the basis of virtual nodes (I don't know if devfs is implicated for sure yet; I'm still looking at that panic). For what it's worth, the vnode problems all seem to be related to lack of data abstraction -- promiscuous use of vnode data, etc. -- and that is not impossible to clean up, just time consuming. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Tue Aug 27 09:25:25 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id JAA13368 for fs-outgoing; Tue, 27 Aug 1996 09:25:25 -0700 (PDT) Received: from who.cdrom.com (who.cdrom.com [204.216.27.3]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id JAA13361; Tue, 27 Aug 1996 09:25:24 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by who.cdrom.com (8.7.5/8.6.11) with SMTP id JAA00139 ; Tue, 27 Aug 1996 09:25:21 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA24806; Tue, 27 Aug 1996 09:12:58 -0700 From: Terry Lambert Message-Id: <199608271612.JAA24806@phaeton.artisoft.com> Subject: Re: s5 filesys implementation? To: amora@obelix.cica.es (Jesus A. Mora Marin) Date: Tue, 27 Aug 1996 09:12:58 -0700 (MST) Cc: fs@freebsd.org, questions@freebsd.org, hackers@freebsd.org In-Reply-To: <199608270626.IAA08884@obelix.cica.es> from "Jesus A. Mora Marin" at Aug 27, 96 08:26:55 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > I am planning to implement support for s5 filesys, since I have a > particular interest in this subject. But I'd hate to do an unnecessary > work, so PLEASE let me know if any fellow is working already on this > stuff, or if this has been done, in fact. I am very out of date: the > last version I have is 2.2-960326SNAP -and waiting eagerly for the > 2.1.5 CD-ROM!-, and there was no support for this oldie. TIA. This is onmy list of "to do" items. I was an engineer for Novell/USG (USL) who worked on, among other things, kernel FS code. The recent SCO offer (and the supposedly yet-to-come offer from them on UnixWare) makes this a lot easier. I have been waiting on the devfs so that I can code physical-to-logical device translation drivers for SVR4/SCO partitioning and disklabelling and treat the things as raw devices; this support, at least, is now immanent. If you are planning on working on FS code in the very near future, then I'd say go ahead. Otherwise, I hope the lanscape will be changing pretty radically pretty soon -- if you have done enough FS work that you can abstract framework components from implementation, then you should be pretty safe for any long term projects that you want to pursue. You may want to consider holding off until the Lite2 integration has been completed; since it changes some of the architecture. I don't think it;s safe to assume that the changes from that direction are done either. Your best bet would be to get in contact with David Greenman or John Dyson, since they make architectural decisions, and the FS is one place that will be hit (one way or the other) by almost all architectural changes. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Tue Aug 27 10:11:17 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id KAA15640 for fs-outgoing; Tue, 27 Aug 1996 10:11:17 -0700 (PDT) Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id KAA15627; Tue, 27 Aug 1996 10:11:13 -0700 (PDT) Received: (from root@localhost) by dyson.iquest.net (8.7.5/8.6.9) id MAA00624; Tue, 27 Aug 1996 12:10:14 -0500 (EST) From: "John S. Dyson" Message-Id: <199608271710.MAA00624@dyson.iquest.net> Subject: Re: The VIVA file system (fwd) To: terry@lambert.org (Terry Lambert) Date: Tue, 27 Aug 1996 12:10:14 -0500 (EST) Cc: dyson@FreeBSD.org, terry@lambert.org, sysseh@devetir.qld.gov.au, freebsd-fs@FreeBSD.org, eric@ms.uky.edu In-Reply-To: <199608271533.IAA24689@phaeton.artisoft.com> from "Terry Lambert" at Aug 27, 96 08:33:53 am Reply-To: dyson@FreeBSD.org X-Mailer: ELM [version 2.4 PL24 ME8] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > > I am tonight porting LFS, maybe with some luck will get it to work. However, > > the FreeBSD VM/Buffer cache stuff affects the filesystem only slightly esp. > > now. The biggest issue in porting from the old Lite stuff is getting > > rid of gratuitious vnode_pager_uncaches, and changing the ordering of > > some operations slightly. > > Any chance you can replace them with a null macro, instead of just diking > them out, in order to make the code less VM implementation dependent? > I'll try, but my main purpose is: 1) A diversion, because even though I love the VM stuff, I am kind-of tired of it right now. 2) Get enough of it running, so that someone else can make it work "well" and/or productize it. My mainline plate is overwhelmingly full. 3) The fact that LFS hasn't been working is a festering sore, and I am mostly putting anti-bacterial ointment on it :-), so the limb doesn't fall off :-). Most, if not all of the changes that I am making are bugfixes. The frequency of vnode_pager_uncache use in the old fs code (for example) is simply wrong, and if I work on LFS too much, it'll simply get rewritten to aggressively use the VM system and elimimate lots of very evil copies :-). Then, diffs or ifdefs will be of NO value :-). John From owner-freebsd-fs Tue Aug 27 16:37:36 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA09496 for fs-outgoing; Tue, 27 Aug 1996 16:37:36 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id QAA09470; Tue, 27 Aug 1996 16:37:30 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id XAA25859; Tue, 27 Aug 1996 23:37:13 GMT Date: Wed, 28 Aug 1996 08:37:13 +0900 (JST) From: Michael Hancock To: Terry Lambert cc: eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: vclean (was The VIVA file system) In-Reply-To: <199608271545.IAA24710@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Tue, 27 Aug 1996, Terry Lambert wrote: > Yes. The number one problem is that the in-place freelist with valid > buffers hanging off of them is not recoverable once the inode data has > been disassociated. The vnode is effectively unrecoverable without > the buffers being freed. > This was the point I was missing. What is disassociating the inode and when is it happening? Regards, Mike From owner-freebsd-fs Wed Aug 28 04:21:22 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id EAA20285 for fs-outgoing; Wed, 28 Aug 1996 04:21:22 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id EAA20263; Wed, 28 Aug 1996 04:21:17 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id LAA00080; Wed, 28 Aug 1996 11:21:08 GMT Date: Wed, 28 Aug 1996 20:21:07 +0900 (JST) From: Michael Hancock To: Terry Lambert cc: eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: vclean (was The VIVA file system) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Wed, 28 Aug 1996, I wrote: > On Tue, 27 Aug 1996, Terry Lambert wrote: > > > Yes. The number one problem is that the in-place freelist with valid > > buffers hanging off of them is not recoverable once the inode data has > > been disassociated. The vnode is effectively unrecoverable without > > the buffers being freed. > > > > This was the point I was missing. What is disassociating the inode and > when is it happening? Yikes! I took a look below, but I didn't expect to see vgone() calls in ufs_inactive(). if (vp->v_usecount == 0 && ip->i_mode == 0) vgone(vp); I need to figure out what ip->i_mode == 0 means. Regards, Mike Hancock From owner-freebsd-fs Wed Aug 28 09:57:56 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id JAA13405 for fs-outgoing; Wed, 28 Aug 1996 09:57:56 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id JAA13383; Wed, 28 Aug 1996 09:57:47 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA26908; Wed, 28 Aug 1996 09:46:22 -0700 From: Terry Lambert Message-Id: <199608281646.JAA26908@phaeton.artisoft.com> Subject: Re: vclean (was The VIVA file system) To: michaelh@cet.co.jp (Michael Hancock) Date: Wed, 28 Aug 1996 09:46:22 -0700 (MST) Cc: terry@lambert.org, eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: from "Michael Hancock" at Aug 28, 96 08:37:13 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > Yes. The number one problem is that the in-place freelist with valid > > buffers hanging off of them is not recoverable once the inode data has > > been disassociated. The vnode is effectively unrecoverable without > > the buffers being freed. > > This was the point I was missing. What is disassociating the inode and > when is it happening? vfs_subr.c: /* * Disassociate the underlying file system from a vnode. */ static void vclean(struct vnode *vp, int flags, struct proc *p) Any time vclean is called... 8-(. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Wed Aug 28 10:04:41 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id KAA13799 for fs-outgoing; Wed, 28 Aug 1996 10:04:41 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id KAA13671; Wed, 28 Aug 1996 10:01:49 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA26928; Wed, 28 Aug 1996 09:50:18 -0700 From: Terry Lambert Message-Id: <199608281650.JAA26928@phaeton.artisoft.com> Subject: Re: vclean (was The VIVA file system) To: michaelh@cet.co.jp (Michael Hancock) Date: Wed, 28 Aug 1996 09:50:18 -0700 (MST) Cc: terry@lambert.org, eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: from "Michael Hancock" at Aug 28, 96 08:21:07 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > This was the point I was missing. What is disassociating the inode and > > when is it happening? > > Yikes! I took a look below, but I didn't expect to see vgone() calls in > ufs_inactive(). > > if (vp->v_usecount == 0 && ip->i_mode == 0) > vgone(vp); > > I need to figure out what ip->i_mode == 0 means. The file type is a non-zero value in the high bits of the mode word; it means that the inode does not refer to real data any more. The vgone call is just part of the subsystem I think should be replaced wholesale; I'd like to see a per FS vrele() (back to locally managed pools) replace most of those calls. The vgone() calls vgone1() calls vclean, and we're back in my hate-zone. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Wed Aug 28 10:33:49 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id KAA15642 for fs-outgoing; Wed, 28 Aug 1996 10:33:49 -0700 (PDT) Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id KAA15619; Wed, 28 Aug 1996 10:33:43 -0700 (PDT) Received: (from root@localhost) by dyson.iquest.net (8.7.5/8.6.9) id MAA00358; Wed, 28 Aug 1996 12:31:38 -0500 (EST) From: John Dyson Message-Id: <199608281731.MAA00358@dyson.iquest.net> Subject: Re: vclean (was The VIVA file system) To: terry@lambert.org (Terry Lambert) Date: Wed, 28 Aug 1996 12:31:38 -0500 (EST) Cc: michaelh@cet.co.jp, terry@lambert.org, eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: <199608281650.JAA26928@phaeton.artisoft.com> from "Terry Lambert" at Aug 28, 96 09:50:18 am Reply-To: dyson@FreeBSD.ORG X-Mailer: ELM [version 2.4 PL24 ME8] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > The file type is a non-zero value in the high bits of the mode word; > it means that the inode does not refer to real data any more. > > The vgone call is just part of the subsystem I think should be replaced > wholesale; I'd like to see a per FS vrele() (back to locally managed > pools) replace most of those calls. The vgone() calls vgone1() calls > vclean, and we're back in my hate-zone. > > I tend to agree with you that it makes more sense to have VOP_VGET and friends be on a per-filesystem basis (well, you know what I mean -- we should consider creating VOP_VGET/VOP_VREF/VOP_VRELE/VOP_VPUT.) I think that it would certainly clean up the layering abstraction that we have. In fact, I had created ssuch a few years ago -- but like usual, VM issues take up all of my time. Right now, I am finding more problems with LFS than I had thought. Not to worry, should still have a soln this weekend. John From owner-freebsd-fs Wed Aug 28 17:30:34 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA18568 for fs-outgoing; Wed, 28 Aug 1996 17:30:34 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA18471; Wed, 28 Aug 1996 17:30:08 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id AAA04895; Thu, 29 Aug 1996 00:29:07 GMT Date: Thu, 29 Aug 1996 09:29:07 +0900 (JST) From: Michael Hancock To: Terry Lambert cc: eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: vclean (was The VIVA file system) In-Reply-To: <199608281650.JAA26928@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Wed, 28 Aug 1996, Terry Lambert wrote: > > > This was the point I was missing. What is disassociating the inode and > > > when is it happening? > > > > Yikes! I took a look below, but I didn't expect to see vgone() calls in > > ufs_inactive(). > > > > if (vp->v_usecount == 0 && ip->i_mode == 0) > > vgone(vp); > > > > I need to figure out what ip->i_mode == 0 means. > > The file type is a non-zero value in the high bits of the mode word; > it means that the inode does not refer to real data any more. > > The vgone call is just part of the subsystem I think should be replaced > wholesale; I'd like to see a per FS vrele() (back to locally managed > pools) replace most of those calls. The vgone() calls vgone1() calls > vclean, and we're back in my hate-zone. My interpretation of the vnode global pool design was that vgone...->vclean wouldn't be called very often. It would only be called by getnewvnode() when free vnodes were not available and for cases when the vnode is deliberately revoked. Inactive() would mark both the vnode/inode inactive, but the data would be left intact even when usecount went to zero so that all the important data could be reactivated quickly. It's not working this way and it doesn't look trivial to get it work this way. Regarding local per fs pools you still need some kind of global memory management policy. It seems less complicated to manage a global pool, than local per fs pools with opaque VOP calls. Say you've got FFS, LFS, and NFS systems mounted and fs usage patterns migrate between the fs's. You've got limited memory resources. How do you determine which local pool to recover vnodes from? It'd be inefficient to leave the pools wired until the fs was unmounted. Complex LRU-like policies across multiple local per fs vnode pools also sound pretty complicated to me. We also need to preserve the vnode revoking semantics for situations like revoking the session terminals from the children of sesssion leaders. Regards, Mike Hancock From owner-freebsd-fs Thu Aug 29 09:33:37 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id JAA17387 for fs-outgoing; Thu, 29 Aug 1996 09:33:37 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id JAA17356; Thu, 29 Aug 1996 09:33:12 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA28774; Thu, 29 Aug 1996 09:16:20 -0700 From: Terry Lambert Message-Id: <199608291616.JAA28774@phaeton.artisoft.com> Subject: Re: vclean (was The VIVA file system) To: michaelh@cet.co.jp (Michael Hancock) Date: Thu, 29 Aug 1996 09:16:20 -0700 (MST) Cc: terry@lambert.org, eric@ms.uky.edu, freebsd-fs@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: from "Michael Hancock" at Aug 29, 96 09:29:07 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > My interpretation of the vnode global pool design was that > vgone...->vclean wouldn't be called very often. It would only be called > by getnewvnode() when free vnodes were not available and for cases when > the vnode is deliberately revoked. > > Inactive() would mark both the vnode/inode inactive, but the data would be > left intact even when usecount went to zero so that all the important data > could be reactivated quickly. > > It's not working this way and it doesn't look trivial to get it work this > way. That's right. This is a natural consequence of moving the cache locality from its seperate location into its now unified location. Because you can not look up a buffer by device (and the device association would never be destroyed for a valid buffer in core, yet unreclaimed), the buffers on the vnodes in the pool lack the localitiy of the pre VM/cache unification code. The unification was such a tremendous win, that this was either hidden, or more likely, discounted. I'd like to see it revisited. > Regarding local per fs pools you still need some kind of global memory > management policy. It seems less complicated to manage a global pool, > than local per fs pools with opaque VOP calls. The amount of memeory is relatively small, and we are already running a modified zone allocator in any case. I don't see any conflict in the definition of a dditional zones. How do I reclaim packet reassembly buffer when I need another vnode? Right now, I don't. The conflict resoloution is intra-pool. Inter-pool conflicts are resolved either by static resource limits, or soft limits and/or watermarking. > Say you've got FFS, LFS, and NFS systems mounted and fs usage patterns > migrate between the fs's. You've got limited memory resources. How do > you determine which local pool to recover vnodes from? It'd be > inefficient to leave the pools wired until the fs was unmounted. Complex > LRU-like policies across multiple local per fs vnode pools also sound > pretty complicated to me. You keep a bias statistic, maintained on a per pool basis, for the reclaimation, and the reclaimer operates at a pool granularity, if in fact you allow such reclaimation to occur (see my paragraph preceeding for preferred approaches to a knowledgable reclaimer). > We also need to preserve the vnode revoking semantics for situations like > revoking the session terminals from the children of sesssion leaders. This is a tty subsystem function, and I do not agree with the current revocation semantics, mostly because I think tty devices should be instanced per controlling tty reference. This would allow the reference to be invalidated via flagging rather than using a seperate opv table. If you look for "struct fileops", you will see another bogosity that makes this this problematic. Resolve the struct fileops, and the carrying around of all that dead weight in the fd structs, and you have resolved the deadfs problem at the same time. The specfs stuff is going to go away with devfs, leaving UNIX domain sockets, pipes (which should be implemented as an opaque FS reference no exported as a mount point mapping to user space), and the VFS fileops (which should be the only ones, and therefore implicit, anyway). It's really not as complicated as you want to make it. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Fri Aug 30 00:09:48 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id AAA15473 for fs-outgoing; Fri, 30 Aug 1996 00:09:48 -0700 (PDT) Received: from obelix.cica.es (obelix.cica.es [150.214.1.10]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id AAA15452 for ; Fri, 30 Aug 1996 00:09:44 -0700 (PDT) Received: (from amora@localhost) by obelix.cica.es (8.7.5/8.7.3) id JAA02749; Fri, 30 Aug 1996 09:05:52 +0200 (GMT-2:00) From: "Jesus A. Mora Marin" Message-Id: <199608300705.JAA02749@obelix.cica.es> Subject: Re: s5 filesys implementation? To: terry@lambert.org (Terry Lambert) Date: Fri, 30 Aug 1996 09:05:51 +0200 (MET) Cc: dyson@iquest.net, jkh@time.cdrom.com, freebsd-fs@freebsd.org In-Reply-To: <199608271612.JAA24806@phaeton.artisoft.com> from "Terry Lambert" at Aug 27, 96 09:12:58 am X-Mailer: ELM [version 2.4 PL25] Content-Type: text Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I'll try to answer to everybody with this. Terry Lambert said: > This is onmy list of "to do" items. I was an engineer for Novell/USG > (USL) who worked on, among other things, kernel FS code. Glad to meet you, Terry. I didn't mean to jeopardize anyone's 'to-do' list :) > If you are planning on working on FS code in the very near future, > then I'd say go ahead. Otherwise, I hope the lanscape will be changing > pretty radically pretty soon -- if you have done enough FS work that > you can abstract framework components from implementation, then you > should be pretty safe for any long term projects that you want to > pursue. > > You may want to consider holding off until the Lite2 integration has > been completed; since it changes some of the architecture. I don't > think it;s safe to assume that the changes from that direction are > done either. > > Your best bet would be to get in contact with David Greenman or John > Dyson, since they make architectural decisions, and the FS is one > place that will be hit (one way or the other) by almost all > architectural changes. Argh! Can you please stop messing a while? I can imagine the picture: a red light starts to blink at the FreeBSD Hackers Command Center. "Hey! Jesus has got a minimal understanding about proc scheduling. Let's change all the stuff right now!" :) Now serious. My main objective with this project is educational. I've been studying the implementation of VFS in SysVR4, and was appealed by its design. Just for fun, I took the FreeBSD code for this subsystem and saw it's organized in a very similar way, if not identical. Furthermore, this code seemed rather clear to me, so I have decided to try a practical approach. I know that support for sysv fs is not a very useful thing, but it's not implemented yet and I've been living with it for many years. sysv fs is not a very sophisticated one and has a very simple organization, so fs-dependent code would not be very painful to write. Also, inode struct is almost identical to that of ffs, and no anaesthetical tricks are needed as those required when you deal with msdosfs, so the implementation should be clean. For some time, I was considering simply to port the code in Linux, but I was not aware of how it performs, nor that its integration in FreeBSD is a straightforward task. So I've decided to start from scratch, but I don't discard to "borrow" some ideas from the Linux code -with its author's consent, of course-. Now, an overview of what I plan is: *) Further learning of the vfs subsystem in FreeBSD, and its interface with fs-dependent code. This phase does not necessarily precedes the others, but overlaps them (crashes teach a lot :) 1) Implementation of fs type dependent VFS ops, ie s5fs_mount(), s5fs_sync(), and so on. I should begin this right now. I'd start working with SCO Unix SVR3 filesys in diskettes only. If I become confident enough, I could install an old SCO Unix version in my HD. I think that giving support for xenix filesystems could be considered, if I find any diskette to work with. User-level mount_s5 command must be avaible at this point. Funny, you can (u)mount it, stat it, but no more :) Anyway, I guess this stage won't take more than a few days. 2) Implementation of fs type dependent vnode ops, (s5fs_open/close, _read/write, ...). I must consider which functions to implement and which ones are nonsensical in a non-primary filesystem type. Suggestions appreciated. 3) Writing additional user-level commands: fsck_s5, mkfs_s5, fsdb_s5? 4) Last but not least: trying to write a doc describing the design and internals of FreeBSD VFS subsystem, and my own experiences with s5 support implementation. It could be a potential contribution to the Documentation Project, that is, given it worths its space in disk and if some masochist volunteer translates my weird "English" into a correct one. Should I get just this one, I will be happy. You can see, I am not a Unix guru, nor a hacker, but I will do my best this time. Also, I hope you will lend me a hand when needed -I'll get stuck, you can bet-. BTW, John Dyson : > I suggest working with the Aug snapshot or later. What's the most recent snapshot stable enough? I run 2.2-960326SNAP at job and had no problems with it. At home, I run 2.1-RELEASE, and received 2.1.5-RELEASE yesterday, but I'm considering to jump to a 2.2-SNAP. Please info and I'll order. So please, hints? suggestions? discouraging comments? funds (cash only please)? Jesus A. Mora amora@obelix.cica.es From owner-freebsd-fs Fri Aug 30 08:49:31 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id IAA28044 for fs-outgoing; Fri, 30 Aug 1996 08:49:31 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id IAA27960 for ; Fri, 30 Aug 1996 08:48:50 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id IAA00629; Fri, 30 Aug 1996 08:35:00 -0700 From: Terry Lambert Message-Id: <199608301535.IAA00629@phaeton.artisoft.com> Subject: Re: s5 filesys implementation? To: amora@obelix.cica.es (Jesus A. Mora Marin) Date: Fri, 30 Aug 1996 08:35:00 -0700 (MST) Cc: terry@lambert.org, dyson@iquest.net, jkh@time.cdrom.com, freebsd-fs@freebsd.org In-Reply-To: <199608300705.JAA02749@obelix.cica.es> from "Jesus A. Mora Marin" at Aug 30, 96 09:05:51 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Jesus A. Mora writes: [ ... ] > So please, hints? suggestions? discouraging comments? funds (cash > only please)? What I said was not meant to discourage you, only to point out that there are bear-traps in the room, and people who will shoot at your feet. The approach you outline seems to tread carefully enough; I don't think you will have major problems. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Fri Aug 30 12:58:54 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id MAA16294 for fs-outgoing; Fri, 30 Aug 1996 12:58:54 -0700 (PDT) Received: from Mail.IDT.NET (mail.idt.net [198.4.75.205]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id MAA16287 for ; Fri, 30 Aug 1996 12:58:51 -0700 (PDT) Received: from sequoia (ppp-39.ts-1.mlb.idt.net [169.132.71.39]) by Mail.IDT.NET (8.7.4/8.7.3) with SMTP id PAA20873; Fri, 30 Aug 1996 15:58:28 -0400 (EDT) Message-ID: <3227481B.7BBE@mail.idt.net> Date: Fri, 30 Aug 1996 15:59:23 -0400 From: Gary Corcoran Reply-To: garycorc@mail.idt.net X-Mailer: Mozilla 3.0 (WinNT; U) MIME-Version: 1.0 To: "Jesus A. Mora Marin" CC: freebsd-fs@freebsd.org Subject: Re: s5 filesys implementation? References: <199608300705.JAA02749@obelix.cica.es> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Jesus A. Mora Marin wrote: > Now, an overview of what I plan is: > *) Further learning of the vfs subsystem in FreeBSD, and its interface with > fs-dependent code. This phase does not necessarily precedes the others, > but overlaps them (crashes teach a lot :) > > 1) Implementation of fs type dependent VFS ops, ie s5fs_mount(), s5fs_sync(), > and so on. I should begin this right now. I'd start working with SCO Unix > SVR3 filesys in diskettes only. If I become confident enough, I could > install an old SCO Unix version in my HD. I think that giving support for > xenix filesystems could be considered, if I find any diskette to work with. > User-level mount_s5 command must be avaible at this point. Funny, you can > (u)mount it, stat it, but no more :) Anyway, I guess this stage won't > take more than a few days. > < more useful stuff deleted> > So please, hints? suggestions? discouraging comments? funds (cash only please)? > > Jesus A. Mora > amora@obelix.cica.es I have lots of old ATT SVR3/SVR4 floppies with (S5?) filesystems on them. It would be great if I could mount them under FreeBSD, so I'd be glad to see you complete your project if it will allow this. I'd offer to help, but I simply don't have the time... Gary Corcoran From owner-freebsd-fs Sat Aug 31 03:00:19 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id DAA02567 for fs-outgoing; Sat, 31 Aug 1996 03:00:19 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id DAA02541; Sat, 31 Aug 1996 03:00:10 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id JAA10724; Sat, 31 Aug 1996 09:59:58 GMT Date: Sat, 31 Aug 1996 18:59:57 +0900 (JST) From: Michael Hancock To: Terry Lambert cc: eric@ms.uky.edu, freebsd-fs@freebsd.org, current@freebsd.org Subject: Re: vclean (was The VIVA file system) In-Reply-To: <199608291616.JAA28774@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Thu, 29 Aug 1996, Terry Lambert wrote: > The amount of memeory is relatively small, and we are already running > a modified zone allocator in any case. I don't see any conflict in > the definition of a dditional zones. How do I reclaim packet reassembly > buffer when I need another vnode? Right now, I don't. The conflict > resoloution is intra-pool. Inter-pool conflicts are resolved either > by static resource limits, or soft limits and/or watermarking. I think watermarking is a good model to program to. From the point of view of users you want it see it unless you run into a problem where you need to look at them. I'd like to see some kind of automatic setting of low/high watermarks based on the resources of the computer that can be overridden by the admin. > > > > Say you've got FFS, LFS, and NFS systems mounted and fs usage patterns > > migrate between the fs's. You've got limited memory resources. How do > > you determine which local pool to recover vnodes from? It'd be > > inefficient to leave the pools wired until the fs was unmounted. Complex > > LRU-like policies across multiple local per fs vnode pools also sound > > pretty complicated to me. > > You keep a bias statistic, maintained on a per pool basis, for the > reclaimation, and the reclaimer operates at a pool granularity, if > in fact you allow such reclaimation to occur (see my paragraph preceeding > for preferred approaches to a knowledgable reclaimer). I'd like to revisit this later. I'm not sure I'd want to see the ability to reclaim go away. > > We also need to preserve the vnode revoking semantics for situations like > > revoking the session terminals from the children of sesssion leaders. > > This is a tty subsystem function, and I do not agree with the current > revocation semantics, mostly because I think tty devices should be > instanced per controlling tty reference. This would allow the reference > to be invalidated via flagging rather than using a seperate opv table. > > If you look for "struct fileops", you will see another bogosity that > makes this this problematic. Resolve the struct fileops, and the > carrying around of all that dead weight in the fd structs, and you have > resolved the deadfs problem at the same time. The specfs stuff is going > to go away with devfs, leaving UNIX domain sockets, pipes (which should > be implemented as an opaque FS reference no exported as a mount point > mapping to user space), and the VFS fileops (which should be the only > ones, and therefore implicit, anyway). Hanging the deadfs ops on the vnode seemed like a cool idea even though it looks like a little extra baggage. I guess we can revisit all this again later after the lite2 merge. Regards, Mike Hancock