From owner-freebsd-fs Mon Jul 8 07:05:43 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id HAA05606 for fs-outgoing; Mon, 8 Jul 1996 07:05:43 -0700 (PDT) Received: from mail.ruhrgebiet.individual.net (in-ruhr.ruhr.de [193.100.176.38]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id HAA05601 for ; Mon, 8 Jul 1996 07:05:34 -0700 (PDT) Received: by mail.ruhrgebiet.individual.net (8.7.1/8.6.12) with UUCP id PAA01079 for freebsd.org!freebsd-fs; Mon, 8 Jul 1996 15:24:15 +0200 (MET DST) Received: by robkaos.ruhr.de (/\oo/\ Smail3.1.29.1 #29.1) id ; Sun, 7 Jul 96 22:34 MET DST Message-Id: From: robsch@robkaos.ruhr.de (Robert Schien) Subject: procfs To: freebsd-fs@freebsd.org Date: Sun, 7 Jul 1996 22:34:07 +0200 (MET DST) X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Is it possible to freeze the current status of a process and restart it at a later time so that it begins execution at the point where it was frozen? I have the problem to do some number crunching. I want to save the process and restart it next day or so. It would be nice to have such a feature. Does any kind of *nix or other OS support this? TIA Robert From owner-freebsd-fs Mon Jul 8 11:49:16 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id LAA29450 for fs-outgoing; Mon, 8 Jul 1996 11:49:16 -0700 (PDT) Received: from baygate.bayarea.net (baygate.bayarea.net [204.71.212.2]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id LAA29442 for ; Mon, 8 Jul 1996 11:49:14 -0700 (PDT) Received: (from mcnab@localhost) by baygate.bayarea.net (8.6.9/8.6.9) id LAA27246; Mon, 8 Jul 1996 11:42:51 -0700 Date: Mon, 8 Jul 1996 11:42:51 -0700 From: David McNab Message-Id: <199607081842.LAA27246@baygate.bayarea.net> To: robsch@robkaos.ruhr.de CC: freebsd-fs@freebsd.org In-reply-to: (robsch@robkaos.ruhr.de) Subject: Re: procfs Reply-to: David McNab Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Robert wrote: |Is it possible to freeze the current status of a process and |restart it at a later time so that it begins execution at |the point where it was frozen? You can suspend it, which means it won't eat any CPU time and can be easily paged out (won't consume any memory -- well, hardly any). But this won't be persistent across boots. If you want real checkpointing -- the kernel writes the executable and relevant context to a file and can later restart it -- then the only UNIX OS I'm familiar with that provides it, albeit in a slightly limited way, is UNICOS, Cray's UNIX. The hardware overhead's a bitch, though :^). It's a hard problem. There's lots of state scattered throughout the "system" that's hard to record and regenerate. It's especially hard if you are doing any networking, because then you have state in foreign address spaces. Most people seem to end up writing their number cruncher so that it periodically hits a "sync point" where they can easily checkpoint it themselves. -- Dave McNab From owner-freebsd-fs Mon Jul 8 14:16:55 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id OAA09297 for fs-outgoing; Mon, 8 Jul 1996 14:16:55 -0700 (PDT) Received: from ra.ibr.cs.tu-bs.de (ra.ibr.cs.tu-bs.de [134.169.246.34]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id OAA09265 for ; Mon, 8 Jul 1996 14:16:26 -0700 (PDT) Received: from achill [134.169.34.18] by ra.ibr.cs.tu-bs.de (8.6.10/tubsibr) with ESMTP id XAA24073; Mon, 8 Jul 1996 23:14:46 +0200 Received: from petri@localhost by achill.ibr.cs.tu-bs.de (8.6.10/tubsibr) id XAA24089; Mon, 8 Jul 1996 23:14:45 +0200 Date: Mon, 8 Jul 1996 23:14:45 +0200 From: Stefan Petri Message-Id: <199607082114.XAA24089@achill.ibr.cs.tu-bs.de> To: freebsd-fs@freefall.freebsd.org, mcnab@bayarea.net, robsch@robkaos.ruhr.de Subject: Checkpointing [Re: procfs] Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk A collection of references about checkpinting and process migration can be found on http://www.cs.tu-bs.de/~petri/pgmigrefs.html Stefan From owner-freebsd-fs Tue Jul 9 19:26:51 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA27512 for fs-outgoing; Tue, 9 Jul 1996 19:26:51 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id TAA27493; Tue, 9 Jul 1996 19:26:43 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id CAA29004; Wed, 10 Jul 1996 02:26:40 GMT Date: Wed, 10 Jul 1996 11:26:40 +0900 (JST) From: Michael Hancock To: freebsd-fs@FreeBSD.ORG cc: freebsd-current@FreeBSD.ORG Subject: Fixing Union_mounts In-Reply-To: <199606251931.MAA00496@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk [Please trim off current and leave fs when replying] Terry posted this reply to the "making in /usr/src" thread. I'd like to see all this stackable fs stuff made usable. I have some questions on Terry's remedies items 2) and 4) below: 2) Moving vnode locking to the vnode from the per fs inode will fix the help fix the stacking problems, but what will it do for future advanced file systems that need to have special locking requirements? 4) Moving the vnodes from the global pool to a per fs pool to improve locality of reference. Won't this make it hard to manage memory? How will efficient reclaim operations be implemented? This stacked fs stuff is really cool. You can implement a simple undelete in the Union layer by making whiteout entries (See the 4.4 deamon book). This would only work for the duration of the mount unlike Novell's persistent transactional stuff, but still very useful. There are already crypto-fs implementation out there, but I'd like to see more; especially non ITAR restricted ones that can be used world-wide. Regards, Mike Hancock On Tue, 25 Jun 1996, Terry Lambert wrote: > This is the intrinsic "union" option. > > It does not work. > > It does not work because VOP_ADVLOCK does not veto. > > It does not work because VOP_LOCK can not be stacked because it is > stupidly referencing flags specific to the underlying vnode for lock > resoloution instead of the union vnode. > > It does not work because VOP_LOOKUP, VOP_RENAME, etc. can not > be stacked because they actually deallocate path structures that > were allocated by code in vfs_syscalls.c, instead of the buffers > being deallocated in vfs_syscalls.c as well, as you would expect > in a proper idempotent layering implementation. > > VOP_LOCK stupidly references these flags because vclean needs them. > > vclean is an abomination before God, and is a half-kludge to deal > with not having both vnode/offset and dev/offset based cache > references simultaneously. > > Use of vnode/offset cache entries is a result of the unified cache > implementation. It saves a bmap call when moving data to/from > user space. It's why FreeBSD has faster I/O than most other systems. > > The lack of a parallel dev/offset based caching allows us to be lazy, > and enlarges the bit limit on FS storage, though it does not help > the inherent limit on file size (due to mapping). > > The lack of a parallel dev/offset results in the need for > implementation of a "second chance cache" via ihash. Still, we > will discard perfectly good pages from cache as a side effect of > having no way to reassociate them with a vnode. > > The use of a global vnode pool instead of per FS mount instance vnode > allocations damages cache locality. Combined with vclean, it also > damages cache coherency. > > > To repair: > > 1) Fix the stackability issues with the VFS interface itself, > which will incidently cause the VFS to more closely conform > to the Heidemann Thesis design on which it is based. Currently > it only implements a subset of the specified functionality. > > 2) Migrate the vnode locking to the vnode instead of the per FS > inode; get rid of the second chance cache at the same time > (the Lite2 code does some of this). The pointer should have > been in the vnode, not the inode, from the very beginning. > > 3) Move the directory name cache out of the per FS code and > into the lookup code. > > 4) Move the vnodes from the global pool; establish a per-FS > vnode free routine. > > 5) Establish VOP_GETPAGE/VOP_PUTPAGE, etc... > > 6) Union mounts will then work without kludges in lookup, locking, > and other code. They *could* be made to work with great, gross > kludges and changes to at least 3 FS's (that I know of), but > that's a kludge I won't do. > > > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. > -- michaelh@cet.co.jp http://www.cet.co.jp CET Inc., Daiichi Kasuya BLDG 8F 2-5-12, Higashi Shinbashi, Minato-ku, Tokyo 105 Japan Tel: +81-3-3437-1761 Fax: +81-3-3437-1766 From owner-freebsd-fs Wed Jul 10 01:27:42 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id BAA21545 for fs-outgoing; Wed, 10 Jul 1996 01:27:42 -0700 (PDT) Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id BAA21527; Wed, 10 Jul 1996 01:27:31 -0700 (PDT) Received: from guillotin.prism.uvsq.fr (guillotin.prism.uvsq.fr [193.51.25.1]) by soleil.uvsq.fr (8.7.5/jtpda-5.2) with ESMTP id KAA17976 ; Wed, 10 Jul 1996 10:27:28 +0200 (METDST) Received: from angrand.prism.uvsq.fr (angrand.prism.uvsq.fr [193.51.25.85]) by guillotin.prism.uvsq.fr (8.7.5/jtpda-5.2) with ESMTP id KAA00273 ; Wed, 10 Jul 1996 10:27:27 +0200 (MET DST) Received: from (son@localhost) by angrand.prism.uvsq.fr (8.7.5/jtpda-5.2) id LAA02630 ; Wed, 10 Jul 1996 11:30:07 +0200 (MET DST) Date: Wed, 10 Jul 1996 11:30:07 +0200 (MET DST) Message-Id: <199607100930.LAA02630@angrand.prism.uvsq.fr> From: Nicolas Souchu To: freebsd-fs@freebsd.org CC: freebsd-scsi@freebsd.org Subject: msdosfs and scsi Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've developed a polling driver for an scsi drive ZIP 100 drive connected to the parallel port. http://www.prism.uvsq.fr/~son/ppa3.html Here is my question/problem: When polling, the system load is horrible... then I want to insert some tsleep() in the driver. In fact, when data is not available, the process which runs into the driver is scheduled with : s = splbio(); tsleep(..., PRIBIO, "mywait", 1); splx (s); BUT: doing this leads 2 concurent processes to a deadlock. $ mount -t msdos /dev/sd0s4 /zip $ time dd if=/dev/zero of=/zip/file bs=8192 count=512 & $ ls -l /zip dd is waiting on channel "getblk", ls is waiting on channel "msdhgt". Debugging the driver shows that dd is scheduled and ls starts reading data from the disk. But then everythings stop. Should the driver be atomic until returning SUCCESSFULLY_QUEUED ? Why ? Why not ? I may get more info. if you need... nicolas -- Nicolas.Souchu@prism.uvsq.fr Laboratoire PRiSM - Versailles, FRANCE From owner-freebsd-fs Wed Jul 10 10:45:31 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id KAA29185 for fs-outgoing; Wed, 10 Jul 1996 10:45:31 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id KAA29164; Wed, 10 Jul 1996 10:45:25 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id RAA04866; Wed, 10 Jul 1996 17:45:17 GMT Date: Thu, 11 Jul 1996 02:45:16 +0900 (JST) From: Michael Hancock To: freebsd-fs@FreeBSD.ORG cc: freebsd-current@FreeBSD.ORG Subject: Re: Fixing Union_mounts In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Never mind. Mail search on "FS and Layering" on current makes for some good reading. I have some absorbing to do. -mike hancock From owner-freebsd-fs Wed Jul 10 15:00:50 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id PAA19316 for fs-outgoing; Wed, 10 Jul 1996 15:00:50 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id PAA19296 for ; Wed, 10 Jul 1996 15:00:41 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA27403; Wed, 10 Jul 1996 14:56:01 -0700 From: Terry Lambert Message-Id: <199607102156.OAA27403@phaeton.artisoft.com> Subject: Re: Fixing Union_mounts To: michaelh@cet.co.jp (Michael Hancock) Date: Wed, 10 Jul 1996 14:56:01 -0700 (MST) Cc: freebsd-fs@FreeBSD.ORG, terry@lambert.org In-Reply-To: from "Michael Hancock" at Jul 10, 96 11:26:40 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > [Please trim off current and leave fs when replying] OK. > Terry posted this reply to the "making in /usr/src" thread. I'd like to > see all this stackable fs stuff made usable. > > I have some questions on Terry's remedies items 2) and 4) below: > > 2) Moving vnode locking to the vnode from the per fs inode will fix the > help fix the stacking problems, but what will it do for future advanced > file systems that need to have special locking requirements? It will not impact them in any way. Specifically, the change is from: syscall() VOP_LOCK() return xxx_lock() return kern_lock.c lock to: syscall() if( kern_lock.c lock == SUCCESS) { if( VOP_LOCK() return xxx_lock() == FAILURE) { kern_lock.c unlock } } Which is to say that the per FS lock code gets the opportunity to veto the locking, but in the default case, will never veto. This leaves room for the complex FS's to veto at will. The same goes for advisory locking. It should be obvious how the lock veto will work for NFS client locking: if( local lock == SUCCESS) { if( remote lock == FAILURE) local unlock } This has the advantage of preventing local conflicts from being appealed over the wire (and perhaps encountering race conditions as a result). > 4) Moving the vnodes from the global pool to a per fs pool to improve > locality of reference. Won't this make it hard to manage memory? How > will efficient reclaim operations be implemented? The memory is allocable per mount instance. The problem with the recovery is in the divorce of the per FS in core inode from the per FS in core vnode, as implemented primarily by the vclean() and family of routines. Specifically, there is already a "max open" limit on the allocated inodes, in the same respect, and with the same memory fragmentation issues coming up as a result. The reclaim operation will be done by multiplexing ffs_vrele the same way ffs_vget, ffs_fhtovp, and ffs_vptofh (operations which also deal with per FS vnode-inode association) currently multiplex VFS_VGET, etc.. The net effect of a real cleanup (which will require something similar to this to be implemented, in any case) will be to actually reduce the number of cache misses -- since there are frequent cases where a vnode is recycled leaving the buffer cache contents in core. A sbsequent read failes to detect this fact, and the disk is actually read instead of a cache hit occurring. This is a relatively huge overhead, and it is unnecessary. This is only foundation work, since it requires a cleanup of the vclean/etc. interfaces in kern/vfs_subr.c. It will have *some* effect, in that an inode in the current ihash without an associated vnode (in the current implementation) will always have a recoverable vnode. This should be an immediate win for ihashget() cache hits, at least in those FS's that implement in core inode hashing (FFS/LFS/EXT2). > This stacked fs stuff is really cool. You can implement a simple undelete > in the Union layer by making whiteout entries (See the 4.4 deamon book). > This would only work for the duration of the mount unlike Novell's > persistent transactional stuff, but still very useful. Better than that. You could implement a persistent whiteout or umsdos type attribution in a file the same way, by stacking on top of the existing FS, and "swallowing" your own file to do the dirty deed. The duration would be permanent, assuming mount order is preserved. This was the initial intent of the "mount over" capability: the mount of the underlying FS would take place, then the FS would be "probed" for stacking by looking for sepcific "swallow" files to determine if tanother FS should mount the FS again on the same mount point interposing its layer. This is specifically most useful right now for implementing a "quota" layer: ripping the quota code out of UFS in particular, and applying it to any FS which has a quota file on it. 8-). > There are already crypto-fs implementation out there, but I'd like to see > more; especially non ITAR restricted ones that can be used world-wide. There is a file-compression (not block compression) FS, which two of John Heidemann's students implemented as part of a class project, as well. There is also the concept of a persistent replicated network FS with intermittent. network connectivity (basically, what the FICUS project implied) for nomadic computing and docking/undocking at geographically seperate locations (I use a floating license from the West coast office to create a "PowerPoint" presentation, fly across the country, plug in my laptop to the East coast office network, and use a floating license from the East coast office to make the actual presentation to the board). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. From owner-freebsd-fs Wed Jul 10 19:45:39 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA08262 for fs-outgoing; Wed, 10 Jul 1996 19:45:39 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id TAA08252 for ; Wed, 10 Jul 1996 19:45:33 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id CAA08956 for ; Thu, 11 Jul 1996 02:45:30 GMT Date: Thu, 11 Jul 1996 11:45:30 +0900 (JST) From: Michael Hancock To: freebsd-fs@FreeBSD.ORG Subject: Re: Fixing Union_mounts In-Reply-To: <199607102156.OAA27403@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Thanks, the mail archives still left a lot of questions. For now 2) is clear. I need to look at the code more to completely understand 4). It would be interesting to hear from other FS/VM people so we can archive this discussion. Hopefully, agreements can be made to towards fully realizing stackable mounts. -mike hancock On Wed, 10 Jul 1996, Terry Lambert wrote: > > 2) Moving vnode locking to the vnode from the per fs inode will fix the > > help fix the stacking problems, but what will it do for future advanced > > file systems that need to have special locking requirements? > > Which is to say that the per FS lock code gets the opportunity to veto > the locking, but in the default case, will never veto. This leaves > room for the complex FS's to veto at will. > > The same goes for advisory locking. It should be obvious how the > lock veto will work for NFS client locking: > > This has the advantage of preventing local conflicts from being > appealed over the wire (and perhaps encountering race conditions > as a result). > > 4) Moving the vnodes from the global pool to a per fs pool to improve > > locality of reference. Won't this make it hard to manage memory? How > > will efficient reclaim operations be implemented? > > The memory is allocable per mount instance. > > The problem with the recovery is in the divorce of the per FS in core > inode from the per FS in core vnode, as implemented primarily by the > vclean() and family of routines. > > Specifically, there is already a "max open" limit on the allocated > inodes, in the same respect, and with the same memory fragmentation > issues coming up as a result. > > > The reclaim operation will be done by multiplexing ffs_vrele the same > way ffs_vget, ffs_fhtovp, and ffs_vptofh (operations which also deal > with per FS vnode-inode association) currently multiplex VFS_VGET, > etc.. > > > The net effect of a real cleanup (which will require something similar > to this to be implemented, in any case) will be to actually reduce the > number of cache misses -- since there are frequent cases where a vnode > is recycled leaving the buffer cache contents in core. A sbsequent > read failes to detect this fact, and the disk is actually read instead > of a cache hit occurring. This is a relatively huge overhead, and it > is unnecessary. > > This is only foundation work, since it requires a cleanup of the > vclean/etc. interfaces in kern/vfs_subr.c. It will have *some* effect, > in that an inode in the current ihash without an associated vnode (in > the current implementation) will always have a recoverable vnode. This > should be an immediate win for ihashget() cache hits, at least in those > FS's that implement in core inode hashing (FFS/LFS/EXT2). > > > > This stacked fs stuff is really cool. You can implement a simple undelete > > in the Union layer by making whiteout entries (See the 4.4 deamon book). > > This would only work for the duration of the mount unlike Novell's > > persistent transactional stuff, but still very useful. > > Better than that. You could implement a persistent whiteout or umsdos > type attribution in a file the same way, by stacking on top of the > existing FS, and "swallowing" your own file to do the dirty deed. > The duration would be permanent, assuming mount order is preserved. > > This was the initial intent of the "mount over" capability: the mount > of the underlying FS would take place, then the FS would be "probed" > for stacking by looking for sepcific "swallow" files to determine if > tanother FS should mount the FS again on the same mount point > interposing its layer. > > > This is specifically most useful right now for implementing a "quota" > layer: ripping the quota code out of UFS in particular, and applying > it to any FS which has a quota file on it. 8-). > > > > There are already crypto-fs implementation out there, but I'd like to see > > more; especially non ITAR restricted ones that can be used world-wide. > > There is a file-compression (not block compression) FS, which two of > John Heidemann's students implemented as part of a class project, as > well. > > There is also the concept of a persistent replicated network FS with > intermittent. network connectivity (basically, what the FICUS project > implied) for nomadic computing and docking/undocking at geographically > seperate locations (I use a floating license from the West coast office > to create a "PowerPoint" presentation, fly across the country, plug > in my laptop to the East coast office network, and use a floating > license from the East coast office to make the actual presentation > to the board). > > > Regards, > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. > -- michaelh@cet.co.jp http://www.cet.co.jp CET Inc., Daiichi Kasuya BLDG 8F 2-5-12, Higashi Shinbashi, Minato-ku, Tokyo 105 Japan Tel: +81-3-3437-1761 Fax: +81-3-3437-1766 From owner-freebsd-fs Wed Jul 10 20:11:49 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id UAA09673 for fs-outgoing; Wed, 10 Jul 1996 20:11:49 -0700 (PDT) Received: from who.cdrom.com (who.cdrom.com [204.216.27.3]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id UAA09668 for ; Wed, 10 Jul 1996 20:11:47 -0700 (PDT) Received: from ccs.sogang.ac.kr (ccs.sogang.ac.kr [163.239.1.1]) by who.cdrom.com (8.6.12/8.6.11) with ESMTP id UAA09347 for ; Wed, 10 Jul 1996 20:11:39 -0700 Received: from cslsun10.sogang.ac.kr by ccs.sogang.ac.kr (8.7.5/Sogang) id LAA24730; Thu, 11 Jul 1996 11:54:31 +0900 (KST) Received: by cslsun10.sogang.ac.kr (4.1/SMI-4.1) id AA06369; Thu, 11 Jul 96 11:51:23 KST Date: Thu, 11 Jul 96 11:51:23 KST From: heo@cslsun10.sogang.ac.kr (Heo Sung Gwan) Message-Id: <9607110251.AA06369@cslsun10.sogang.ac.kr> Apparently-To: Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi, From owner-freebsd-fs Wed Jul 10 20:37:49 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id UAA10699 for fs-outgoing; Wed, 10 Jul 1996 20:37:49 -0700 (PDT) Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id UAA10694 for ; Wed, 10 Jul 1996 20:37:43 -0700 (PDT) Received: (from root@localhost) by dyson.iquest.net (8.7.5/8.6.9) id WAA07219; Wed, 10 Jul 1996 22:37:11 -0500 (EST) From: "John S. Dyson" Message-Id: <199607110337.WAA07219@dyson.iquest.net> Subject: Re: Fixing Union_mounts To: michaelh@cet.co.jp (Michael Hancock) Date: Wed, 10 Jul 1996 22:37:11 -0500 (EST) Cc: freebsd-fs@FreeBSD.ORG In-Reply-To: from "Michael Hancock" at Jul 11, 96 11:45:30 am Reply-To: dyson@FreeBSD.ORG X-Mailer: ELM [version 2.4 PL24 ME8] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > Thanks, the mail archives still left a lot of questions. > > For now 2) is clear. I need to look at the code more to completely > understand 4). > > It would be interesting to hear from other FS/VM people so we can archive > this discussion. Hopefully, agreements can be made to towards fully > realizing stackable mounts. > My two cents (pence, lira, yen, etc...) : I hope to look at this thread this weekend. I know that we need to get off our duffs starting to make progress on the FS front. My FreeBSD time is right now tied up on making the swapon/swapoff stuff real. There is action about to happen on the Jeffery Hsu Lite-2 stuff, and I heard that Kirk's ordered-delay writes project might be starting. This weekend I am going to dedicate a day or so to work with people to understand all of this so that I can help contribute. DG needs to get involved also, and I think that his time is freeing up (that damn -stable release has tied up very very valuable resources.) My language (composition) skills suck, but I have excellent reading skills. Reading Terry's stuff is sometimes very difficult :-). Makes me think that I have 2nd grade reading skills at times... I have convinced myself that he is right about the managment of the namei buffers, but there is MUCH MUCH more to do!!! Some people have been commenting (in private and -core email) that we are moving too fast... We on some fronts we are moving like molasses, and I sure would like to see more progress. I am kind of the VM person (Well DG and I are), but do not feel nearly as competent on the FS front. As an ABSOLUTE minimum, I can provide alot of "nice" hooks into the VM system for filesystem memory management (LFS really needs help, for example.) I really see the need for a fairly close, collabarative effort on the FS code structure and filesystems. However, there are, at times, diverging opinions on how things should be done. We need to get organized!!! I am at my physical limit now, working a regular job, needing to find another SO or pseudo-SO, and of course my most important SO needs attention (The FreeBSD issues that I already have committed to.) I don't know if this position is acceptable to all, but I am thinking that we need to (eventually) empower an FS development team, like we have kind of done so with the VM stuff. Sorry for my rambling, John From owner-freebsd-fs Wed Jul 10 21:57:31 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA14643 for fs-outgoing; Wed, 10 Jul 1996 21:57:31 -0700 (PDT) Received: from ccs.sogang.ac.kr (ccs.sogang.ac.kr [163.239.1.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id VAA14638 for ; Wed, 10 Jul 1996 21:57:27 -0700 (PDT) Received: from cslsun10.sogang.ac.kr by ccs.sogang.ac.kr (8.7.5/Sogang) id NAA27547; Thu, 11 Jul 1996 13:54:37 +0900 (KST) Received: by cslsun10.sogang.ac.kr (4.1/SMI-4.1) id AA06370; Thu, 11 Jul 96 11:52:55 KST Date: Thu, 11 Jul 96 11:52:55 KST From: heo@cslsun10.sogang.ac.kr (Heo Sung Gwan) Message-Id: <9607110252.AA06370@cslsun10.sogang.ac.kr> To: undisclosed-recipients:; Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi, I want to know something about lfs(log-structured filesystem). From owner-freebsd-fs Wed Jul 10 22:45:25 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id WAA18419 for fs-outgoing; Wed, 10 Jul 1996 22:45:25 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id WAA18412 for ; Wed, 10 Jul 1996 22:45:22 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id FAA10203; Thu, 11 Jul 1996 05:44:58 GMT Date: Thu, 11 Jul 1996 14:44:58 +0900 (JST) From: Michael Hancock Reply-To: Michael Hancock To: Heo Sung Gwan cc: freebsd-fs@FreeBSD.ORG Subject: lfs) In-Reply-To: <9607110252.AA06370@cslsun10.sogang.ac.kr> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Thu, 11 Jul 1996, Heo Sung Gwan wrote: > Hi, > I want to know something about lfs(log-structured filesystem). > The sources are available with FreeBSD. You can also get the following books: Unix Internals: The New Frontiers (by Uresh Vahalia) There's lots of stuff in there about file systems. The Design and Implementation of the 4.4BSD Operating System (by McKusick et. al) You might want to poke around http://www.usenix.org to find related papers. A good source would also be http://deas.harvard.edu (look at Margo Seltzer's work). -mike hancock From owner-freebsd-fs Wed Jul 10 23:02:29 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id XAA19340 for fs-outgoing; Wed, 10 Jul 1996 23:02:29 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA19332 for ; Wed, 10 Jul 1996 23:02:26 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id GAA10311 for ; Thu, 11 Jul 1996 06:02:24 GMT Date: Thu, 11 Jul 1996 15:02:24 +0900 (JST) From: Michael Hancock Reply-To: Michael Hancock To: freebsd-fs@FreeBSD.ORG Subject: Re: Fixing Union_mounts In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Sorry, I can't shutup. I'm fuzzy on 4), and will be until I read the sources more. I just want to backup and talk about the design objectives. The fathers of 4.4 thought having a global vnode pool vs. partitioning the pools per fs was a win for kernel memory management when several different file systems are in use. Your design goals seems to be an SMP perspective which means we need to think differently to understand what your saying. If we step back and look at this from the point of view of the 4.4 implementers, what are the consequences of moving away from a global vnode pool? What are the wins? -mike hancock From owner-freebsd-fs Wed Jul 10 23:37:49 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id XAA20656 for fs-outgoing; Wed, 10 Jul 1996 23:37:49 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA20651 for ; Wed, 10 Jul 1996 23:37:46 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id GAA10559; Thu, 11 Jul 1996 06:37:33 GMT Date: Thu, 11 Jul 1996 15:37:33 +0900 (JST) From: Michael Hancock To: Heo Sung Gwan cc: freebsd-fs@FreeBSD.ORG Subject: Re: lfs) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Thu, 11 Jul 1996, Michael Hancock wrote: > papers. A good source would also be http://deas.harvard.edu (look at > Margo Seltzer's work). Oops. http://www.deas.harvard.edu -mh From owner-freebsd-fs Wed Jul 10 23:58:48 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id XAA21481 for fs-outgoing; Wed, 10 Jul 1996 23:58:48 -0700 (PDT) Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA21474; Wed, 10 Jul 1996 23:58:45 -0700 (PDT) Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id GAA10724; Thu, 11 Jul 1996 06:58:42 GMT Date: Thu, 11 Jul 1996 15:58:42 +0900 (JST) From: Michael Hancock To: dyson@FreeBSD.ORG cc: freebsd-fs@FreeBSD.ORG Subject: Re: Fixing Union_mounts In-Reply-To: <199607110337.WAA07219@dyson.iquest.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Thanks, I guess it's bad timing with all the release work happening now. On Wed, 10 Jul 1996, John S. Dyson wrote: > I hope to look at this thread this weekend. I know that we need to get > off our duffs starting to make progress on the FS front. My FreeBSD time > is right now tied up on making the swapon/swapoff stuff real. > > There is action about to happen on the Jeffery Hsu Lite-2 stuff, and > I heard that Kirk's ordered-delay writes project might be starting. This Yes, the Lite2 stuff is needed to proceed further. Regarding Delayed-Ordered Writes. Here's an excerpt from Terry's Usenet posting on the UnixWare group: >Contrast this with the UnixWare 2.x UFS, which uses Delayed >Ordered Writes. These require significant changes to each >FS's structure to implement, and do not scale reeentrancy >per vnode across multiple processors for a particular vnode >buffer. They are about 35% slower than soft updates under >loading, and tend to have bad cache effects. I agree that things should probably slow down, but to sit down and do more *designing*. DOW is an performance optimization, and before doing that I think we should take a harder look at the framework that serves as the foundation for all further work. I'd hate to see the same mistakes done in SVR/4MP go into 4.4BSD. Identifying these mistakes might be hard, but I think we should try. -mike hancock From owner-freebsd-fs Fri Jul 12 17:29:30 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA26864 for fs-outgoing; Fri, 12 Jul 1996 17:29:30 -0700 (PDT) Received: from veda.is (root@ubiq.veda.is [193.4.230.60]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA26838; Fri, 12 Jul 1996 17:28:07 -0700 (PDT) Received: (from adam@localhost) by veda.is (8.7.5/8.7.3) id AAA07477; Sat, 13 Jul 1996 00:27:59 GMT From: Adam David Message-Id: <199607130027.AAA07477@veda.is> Subject: strangest weirdness To: freebsd-current@freebsd.org Date: Sat, 13 Jul 1996 00:27:54 +0000 (GMT) Cc: freebsd-fs@freebsd.org X-Mailer: ELM [version 2.4ME+ PL22 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Well I have just seen what seems to be an unusual filesystem glitch. I was doing 'make depend' in 2 kernel directories concurrently, and at the same time as another kernel 'make all' was getting towards the end of its processing. Both instances of 'make depend' broke by invoking the editor 'ex' on an empty temporary file, following the first invocation of 'mkdep'. No other instances of 'ex' were running at the time as far as I can tell. This was with an NFS /usr, and I believe that the 'make' executable was reinstalled after the 'make all' was started but before the 'make depend' was started. (yes, it's called stress testing. ;) I have also noticed that executables dump core often on client machines when the files on the fileserver have been updated "under their feet". Okay I know "if it hurts, don't do that", but why do these glitches occur? -- Adam David From owner-freebsd-fs Sat Jul 13 02:10:34 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id CAA25278 for fs-outgoing; Sat, 13 Jul 1996 02:10:34 -0700 (PDT) Received: from irz301.inf.tu-dresden.de (irz301.inf.tu-dresden.de [141.76.1.11]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id CAA25259; Sat, 13 Jul 1996 02:10:27 -0700 (PDT) Received: from sax.sax.de by irz301.inf.tu-dresden.de (8.6.12/8.6.12-s1) with ESMTP id LAA26066; Sat, 13 Jul 1996 11:10:13 +0200 Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id LAA10257; Sat, 13 Jul 1996 11:10:13 +0200 Received: (from j@localhost) by uriah.heep.sax.de (8.7.5/8.6.9) id KAA22966; Sat, 13 Jul 1996 10:21:11 +0200 (MET DST) From: J Wunsch Message-Id: <199607130821.KAA22966@uriah.heep.sax.de> Subject: Re: strangest weirdness To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org Date: Sat, 13 Jul 1996 10:21:11 +0200 (MET DST) Cc: adam@veda.is (Adam David) Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) In-Reply-To: <199607130027.AAA07477@veda.is> from Adam David at "Jul 13, 96 00:27:54 am" X-Phone: +49-351-2012 669 X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F 93 21 E0 7D F9 12 D6 4E X-Mailer: ELM [version 2.4ME+ PL17 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk As Adam David wrote: > I have also noticed that executables dump core often on client machines when > the files on the fileserver have been updated "under their feet". Okay I know > "if it hurts, don't do that", but why do these glitches occur? Terry will certainly jump in now and explain you that it would be better to move over the entire executable into local swap instead of relying on the ability to page it in from the NFS server. The latter is what we're doing right now -- so you are simply not expected to remove it on the server. The Unix semantics of ``a file will only be removed once the last reference to it disappeared'' don't work over NFS since the server simply doesn't know (and cannot know due to the statelessness of the protocol) which clients still hold references on some file. These semantics are being emulated in the case where you unlink a file on the client that has still other references, by renaming the file on the server first, and remove it later. -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE Never trust an operating system you don't have sources for. ;-) From owner-freebsd-fs Sat Jul 13 21:09:14 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA05394 for fs-outgoing; Sat, 13 Jul 1996 21:09:14 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id VAA05378; Sat, 13 Jul 1996 21:09:11 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id VAA06010; Sat, 13 Jul 1996 21:03:16 -0700 From: Terry Lambert Message-Id: <199607140403.VAA06010@phaeton.artisoft.com> Subject: Re: strangest weirdness To: joerg_wunsch@uriah.heep.sax.de Date: Sat, 13 Jul 1996 21:03:16 -0700 (MST) Cc: freebsd-current@freebsd.org, freebsd-fs@freebsd.org, adam@veda.is In-Reply-To: <199607130821.KAA22966@uriah.heep.sax.de> from "J Wunsch" at Jul 13, 96 10:21:11 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > I have also noticed that executables dump core often on client machines when > > the files on the fileserver have been updated "under their feet". Okay I know > > "if it hurts, don't do that", but why do these glitches occur? > > Terry will certainly jump in now and explain you that it would be > better to move over the entire executable into local swap instead of > relying on the ability to page it in from the NFS server. The latter > is what we're doing right now -- so you are simply not expected to > remove it on the server. The Unix semantics of ``a file will only be > removed once the last reference to it disappeared'' don't work over > NFS since the server simply doesn't know (and cannot know due to the > statelessness of the protocol) which clients still hold references on > some file. These semantics are being emulated in the case where you > unlink a file on the client that has still other references, by > renaming the file on the server first, and remove it later. Actually, you could implement a simple distributed cache coherency protocol for executables with a slight modification of the rpc.statd code in current and a minor change to the NFS client. It wouldn't be an NFS spec compliant implementation afterwards, but it would solve the problem. I would like to see a flag in the mount structure for FS's which is inherited from the FS type, so that the dev of an inode about to be exec'ed may be dereferenced through the mount struct to decide if the image is coming from local stable storage, local removable storage, or network storage. I would also like to se an option where an executable image could be forced into local memory. If swap is available, it would be considered to be local memory. I would like to see a default of the current behaviour, with sysctl based controls to cause the exec to force the image into local memory in the local removable media case or the network storage case, or both, under user configuration. To solve your problem (and for my personal defaults selection), you would set the flag for the exec from network storage case. Mach, Linux, SunOS, Solaris, SVR4, SCO Xenix, etc., all have the behaviour of using an image for swap store, and when the image is modified without notification (the image is modified on the NFS server case) or when the image is "deleted" without notification the CDROM/floppy removal case), the client system is the one that suffers. What makes this particularly onerous in NFS is that one NFS client can intentially crash another NFS client of the same server, given knowledge of what programs are running and a writable server store. In addition, this method can be used to hack an otherwise secure client, typically by rewriting the target page on the server so that when the accept completes on sendmail, it throws up a shell, or something similar. Sendmail is SUID root, so it is a bad example, but telnetd (/usr/libexec/telnetd) is a good candidate for this hack, since the ruserok() does not specifically block vouchsafe for "bin", only for "root", and telnetd is owned by "bin". Since this bin owned binary will be run by root (inetd) on a client connect, it is an ideal place to hack. Besides the security issues, it's just plain annoying to have the client quit functioning, or hang pending a pagin from the server when the server has gone down (diskless/dataless configurations of SunOS are frequently sworn at for this failing). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.