From owner-freebsd-fs Sun Nov 21 11:25:10 1999 Delivered-To: freebsd-fs@freebsd.org Received: from excalibur.lps.ens.fr (excalibur.lps.ens.fr [129.199.120.3]) by hub.freebsd.org (Postfix) with ESMTP id 5812F1591A for ; Sun, 21 Nov 1999 11:24:29 -0800 (PST) (envelope-from Thierry.Besancon@lps.ens.fr) Received: from (besancon@localhost) by excalibur.lps.ens.fr (8.9.3/jtpda-5.3.1) id UAA16636 ; Sun, 21 Nov 1999 20:24:17 +0100 (MET) To: "Mark W. Krentel" Cc: freebsd-fs@FreeBSD.ORG Subject: Re: running linux binaries from ext2fs partition References: <199911202017.PAA03794@dreamscape.com> Cc: besancon@lps.ens.fr From: Thierry.Besancon@lps.ens.fr Date: 21 Nov 1999 20:24:15 +0100 In-Reply-To: "Mark W. Krentel"'s message of Sat, 20 Nov 1999 15:17:58 -0500 (EST) Message-ID: Lines: 85 X-Mailer: Gnus v5.3/Emacs 19.34 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Dixit "Mark W. Krentel" (le Sat, 20 Nov 1999 15:17:58 -0500 (EST)) : >> >> Is it possible to run linux (or freebsd) binaries directly from a >> local ext2fs partition? >> >> ... >> >> While we're on the subject, on what filesystem types is it ok to run >> binaries? Local freebsd (UFS), NFS, and cdrom should all work, right? >> Are there others? >> Hello I don't know the answer to the last question but here's what I found. I setup X terminals using FreeBSD 3.3-RELEASE. /tmp is a MFS : Filesystem 1K-blocks Used Avail Capacity Mounted on 129.199.120.250:/ 127023 31651 85211 27% / mfs:29 959 668 215 76% /conf/etc /conf/etc 959 668 215 76% /etc 129.199.120.250:/usr 190543 153042 22258 87% /usr 129.199.120.250:/usr/local 2846396 1958786 659899 75% /usr/local mfs:61 3935 1431 2190 40% /var /var/tmp 3935 1431 2190 40% /tmp mfs:91 1511 47 1344 3% /dev The X terminal runs without any swap. /etc/rc.sysctl confirms it as well : sysctl -w vm.swap_enabled=0 Whenever I run an executable residing in the mfs /tmp, it justs hangs the kernel : # cp /bin/ls /tmp # df /tmp/. Filesystem 1K-blocks Used Avail Capacity Mounted on /var/tmp 3935 1432 2189 40% /tmp # /tmp/ls (workstation freezes) Here's the panic : Fatal trap 12 : page fault while in kernel mode fault virtual address = 0x3e fault code = supervisor read, page not present instruction pointer = 0x8:0xc022bf14 stack pointer = 0x10:0xc4546bc8 frame pointer = 0x10:0xc4546ca4 code segment = base 0x0, list 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 precessor eflags = interrupt disabled, resume, IOPL = 0 current process = 355 (csh) interrupt mask = net tty bio cam kernel : type 12 trap, code = 0 Stopped at ffs_vptofh+0xfe0: cmpw $0x2,0x3e(%edx) and the trace : db> trace ffs_vptofh(c4546d5c,c4514300,1000,0,c4546cf4) at ffs_vptofh+0xfe0 end(c4546d5c) at 0xc087c485 vnode_pager_freepage(c4559a2c,c4546db8,1,0,c4546df8) at vnode_pager_freepage+0x556 vm_pager_get_pages(c4559a2c,c4546db8,1,0,c4546f18) at vm_pager_get_pages+0x1f exec_map_first_page(c4546e94,c44c55a8,c02fe464,0,4) at exec_map_first_page+0xba execve(c44c55a0,c4546f94,80922e0,80940000,8085000) at execve+0x19e syscall(27,27,8085000,8094000,bfbffbb0) at syscall+0x187 Xint0x80_syscall() at Xint0x80_syscall+0x2c (not too deep) Given I have no swap (vm.swap_enabled=0), it is not easy to supply vmcore. But I can provide any help as I can reproduce the crash at will. If someone has a clue on how to fix that... Thierry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 24 10:21:19 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 2E342152E8; Wed, 24 Nov 1999 10:21:07 -0800 (PST) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id LAA21327; Wed, 24 Nov 1999 11:19:54 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAASQaazP; Wed Nov 24 11:19:35 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id LAA19803; Wed, 24 Nov 1999 11:19:52 -0700 (MST) From: Terry Lambert Message-Id: <199911241819.LAA19803@usr08.primenet.com> Subject: Re: namei() and freeing componentnames To: eivind@FreeBSD.ORG (Eivind Eklund) Date: Wed, 24 Nov 1999 18:19:52 +0000 (GMT) Cc: fs@FreeBSD.ORG In-Reply-To: <19991112000359.A256@bitbox.follo.net> from "Eivind Eklund" at Nov 12, 99 00:03:59 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I would like to make this reflexive - "symmetrical" allocation and > free, like it presently is supposed to be with SAVESTART (but isn't - > there are approximately one billion bugs in the code). > > I suspect that for some filesystems (though none of the present ones), > it might be necessary to do more than a > zfree(namei_zone,cnp->cn_pnbuf) in order to free up all the relevant > data. In order to support this, we'd have to introduce a new VOP - > tentatively called VOP_RELEASEND(). Unfortunately, this comes with a > performance penalty. A VOP_RELEASEND() call is a bad idea. The path name buffers should be considered an opaque resource by the underlying filesystem. One can think of the path name buffers as containing three parts: 1) Allocated information which may be referenced by a VFS, but not deallocated or otherwise modified. 2) Context-free statites. This is state information which is present in the structure, and can be modified by a VFS according to globally applicable rules. 3) Contextual statites. This is state information which is present in the structure, and can be modified by a VFS according to contract with upper level code. Currently, there are not VFSs which support, require, or use contextual statetites. Such things will probably be necessary to support multiple simultaneous name spaces which are not lazy-bound (e.g. supporting the 8.3 and long name name spaces for newly created files in a VFAT32FS or NTFS), but this is a special case for which other FreeBSD support is currently missing anyway. I would delay the introduction of a VOP dealing with path name buffers until such time as contextual statites that require VFS based allocation of arbitrary structure data become necessary. Even then, it may be only necessary to realize two additional structure elements: one that has a void pointer, and one that has the memory pool from which the data referenced by a non-NULL void pointer was allocated (one wonders why a pointer can not be asked to which pool it belongs, so that pool identity is not required on free). A common technique used in such cases is to allocated the data pointed to by an allocated structure contiguous to the structure (e.g. in the same allocation), and have the internal structure pointer elements point into memory following the structure. This allows the pointer to be freed opaquely, with all concommitant allocations, e.g.: struct foo { char *string; ... }; struct foo *p; p = malloc( sizeof(struct foo) + strlen(str) + 1); p->string = ((char *)p) + sizeof(struct foo); strcpy( p->string, str); ... free( p); You say that you want it to be reflexive and symmetrical; path name buffers are allocated by the VFS consumer. To achieve this goal, they must also be deallocated by the VFS consumer. One of the largest barriers to transaction using VFSs in BSD at this point is that the VOP_ABORTOP() frees the path name buffer, and it should not. > It also allows an evil hack: > The NFS code is rather incestuous with the VFS system, in order to > minimize the amount of cached data during NFS requests. It is, like the system call layer, a consumer of the VFS. It is not NFS' fault that the system call layer has historically been treated as a "more equal pig" when it comes to consuming the VFS. I am well aware of the path name buffer switch that occurs in the NFS server. The simple answer is "caller frees". One the path name buffer allocation and deallocation has been rationalized, the NFS code becomes much simpler: as a consumer of the VFS interface, it allocates and deallocates the path name buffers that it utilized, just like any other VFS consumer. The main grossness comes from the use of "goto" statements and targets in the macro definitions. This can be alleviated be incorporating the path name free into the "bail out" case, and preinitializing the path name buffer pointer to NULL so that it can be tested for validity on a premature exit. > One side of > this is that it seems to throw away the vnode we'd like to use for > VOP_RELEASEND() - before it wants to throw away the componentname. Yes. If you examine the vop_lookup.c code, you will see that it avoids this by hiding the act in a mutual function recursion; this is the same one that it uses to do symlink expansion in pace in the path name buffer to avoid having to allocate more buffer space, and to avoid exceeding the 1024 byte path length limit on the allocated path name buffer. > Is it too evil? I'm of two minds - I don't like messing more than > necessary with the NFS code (and isn't sure I could do the messing > without performance impact), but I'm not exactly ecstatic about the > hack, either. It's too evil, from a lot of perspectives. I think that the per-VFS lookup private resource release is a premature feature creep, and it's probably not justified, when a relatively opaque (or opaque, if the memory pool identity didn't need to be cached) pointer could take its place. I believe the NFS code could be handled without a performance impact; there are already path component name buffers being allocated and deallocated in the cases you are worried about, they're just not being allocated and deallocated symmetrically. I also think that the primary evil of the additional VOP is that it takes the code further from where it needs to be. The abomination that is NFS cookies is a result of overloading the VOP_LOOKUP code in order to obtain directory restart, when the underlying FS's directory entry block entry (struct dirent) is larger than the one that you proxy over the wire. I think that the correct way to deal with this is to define an externalization VOP seperate from the VOP_LOOKUP, which will do the data externalization for you. This would have the side effect of NFS-izing all future FSs, since the same code could be used both by NFS and the system call layer. Currently, the system call layer does not do the "cookie dance", and so that code is relatively unmaintained. If all VFS consumers consumed the same code path, the code in the path would be maintained. Anyway, that's my two cents... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 24 11: 4:49 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 7AF86153E8; Wed, 24 Nov 1999 11:04:09 -0800 (PST) (envelope-from tlambert@usr08.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id MAA03761; Wed, 24 Nov 1999 12:03:15 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp02.primenet.com, id smtpd003665; Wed Nov 24 12:03:07 1999 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id LAA21738; Wed, 24 Nov 1999 11:55:04 -0700 (MST) From: Terry Lambert Message-Id: <199911241855.LAA21738@usr08.primenet.com> Subject: Re: namei() and freeing componentnames To: eivind@FreeBSD.ORG (Eivind Eklund) Date: Wed, 24 Nov 1999 18:55:04 +0000 (GMT) Cc: ezk@cs.columbia.edu, fs@FreeBSD.ORG In-Reply-To: <19991118153220.E45524@bitbox.follo.net> from "Eivind Eklund" at Nov 18, 99 03:32:20 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Yes, this is the intent. > > The problem I'm finding with VOP_RELEASEND() is that namei() can > return two different vps - the dvp (directory vp) and the actual vp > (inside the directory dvp points at), and that neither of these are > always available. What gets returned is based on the flags passed down. I think that trying to encapsulate this transparently, so that any namei() operation that succeeds or fails can be freed in its entirety without resort to flags specific code in the caller is a mistake. I don't think you can reasonably do this. One issue that occurs to me is that namei() itself, and not the underlying VOP_LOOKUP code, should be the one to reference the path component name cache. If the underlying VFS doesn't want the cache hit to occur without notifying it of the event, then it needs to not enter the data in the cache. This would simplify a large amount of code. The other simplification, which is organizational, and could, using inline functions, be effectively NULL additional code overhead, is to seperate the lookup operations by request type. Whether or not something wants the parent directory back has much to do with whther it is a create or rename operation, and little to do with anything else. Operations which intend to modify the returned directory entry are very distinct from those merely doing a lookup. I have often felt that much of the mess create/rename/delete/open variant behaviour causes should be addressed by moving the complexity to upper level code. > Progress report: Based on current rate of progress, it looks like I'll > be able to have patches ready for (my personal) testing sunday (or > *possibly* saturday, but most likely not). Depending on how > testing/debugging works out, the patches will most likely be ready for > public testing sometime next week. I'll need help with NFS testing. Heh. This is the same stumbling block I hit, needing help with NFS testing. I created, and I believe it was Peter who updated it, a testing framework that can detect kernel memory leaks from user space, and which exercised the entire branch path for the namei()/nameifree() cases. This would probably be a good thing for someone to use, since it will identify the branch path in which any memory leaks are occurring. > Forward view: I'm undecided on the next step. Possibilities: > > (1) Change the way locking is specificied to make it feasible to test > locking patches properly, and change the assertion generation to > generate better assertions. This will probably require changing > VOP_ISLOCKED() to be able to take a process parameter, and return > different valued based on wether an exlusive lock is held by that > process or by another process. The present behaviour will be > available by passing NULL for this parameter. > > Presently, running multiple processes does not work properly, as > the assertions do not really assert the right things. > > These changes are necessary to properly debug the use of locks, > which I again believe is necessary for stacking layers (which I > would like to work in 4.0, but I don't know if I will be able to > have ready). This would be nice; I still believe most of the vnode and the advisory locking code can move to upper layers. I think it is the responsibility of the stacking layers to propagate locks, and the only place that this is really an issue is on fan-in or fan-out. Please keep an eye towards not precluding Jermey Allisons work on a kernel opportunity locking interface, since it's really needed to do hosted OS/host OS coherency properly (e.g. Samba clients must obey UNIX locks, and UNIX applications must obey those of Samba). This is similar to what NFS clients and local applications must do to interoperate, and is the primary purpose of the LOASE interface. > (2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can, > and return how much that was" rather than "Eat a single path > component; we have already decided what this is." > This allows different types of namespaces, and it allows > optimizations in VOP_LOOKUP() when several steps in the traversal > is inside a single filesystem (and hey - who mounts a > new filesystem on every directory they see, anyway?) The path component buffer mechanism already specifies this behaviour as one of its initial design requirements, so I think this is already taken care of. What does not happen is that lookups that will take place in a single VFS are not held down in that VFS for the entire traversal, but instead pop up to namei(). I don't think you can get rid of this, without destroying the "union" option (not the same as the "unionfs"), and without damaging the ability to cover mount points and to chroot or do symlink expansion, or deal with POSIX namespace escape. The original reason for allowing this behaviour at all, according to Heidemann's thesis, is to permit an underlying FS to "eat as much as you want", as opposed to "eat as much as you can". This was used in proxy VFS stacking layers, since a proxy layer knows that it owns the entire tree inferior to the current component. One "low hanging fruit" optimization that can be made is to _always_ set the fdp->fd_rdir to the processes current root directory; this avoids the NULL/non-NULL test, so long as it is inherited correctly on fork, and set for init. This would be very nice for many other reasons... 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 24 22:22:24 1999 Delivered-To: freebsd-fs@freebsd.org Received: from europa.dreamscape.com (europa.dreamscape.com [206.64.128.147]) by hub.freebsd.org (Postfix) with ESMTP id 11E5A14CEC for ; Wed, 24 Nov 1999 22:22:17 -0800 (PST) (envelope-from krentel@dreamscape.com) Received: from dreamscape.com (sA19-p21.dreamscape.com [209.217.200.84]) by europa.dreamscape.com (8.8.5/8.8.4) with ESMTP id BAA27780; Thu, 25 Nov 1999 01:21:57 -0500 (EST) X-Dreamscape-Track-A: sA19-p21.dreamscape.com [209.217.200.84] X-Dreamscape-Track-B: Thu, 25 Nov 1999 01:21:57 -0500 (EST) Received: (from krentel@localhost) by dreamscape.com (8.9.3/8.9.3) id BAA19286; Thu, 25 Nov 1999 01:20:16 -0500 (EST) (envelope-from krentel) Date: Thu, 25 Nov 1999 01:20:16 -0500 (EST) From: "Mark W. Krentel" Message-Id: <199911250620.BAA19286@dreamscape.com> To: freebsd-fs@FreeBSD.ORG, Thierry.Besancon@lps.ens.fr Subject: Re: running linux binaries from ext2fs partition Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Thierry Besancon writes: > Whenever I run an executable residing in the mfs /tmp, it justs hangs > the kernel : I also mount a MFS on /tmp. I tried copying ls, find, emacs onto /tmp and ran them from there. Works fine for me in 3.3-stable. But there's something odd in your mounts: > Filesystem 1K-blocks Used Avail Capacity Mounted on > ... > mfs:61 3935 1431 2190 40% /var > /var/tmp 3935 1431 2190 40% /tmp You're remounting a subdir of /var onto /tmp? Wouldn't a symlink be a better choice here? That is, don't mount /var/tmp onto /tmp. Instead, make /tmp a symlink that points to /var/tmp. Try that and see if you still get the crashes. But I'm still wondering about running binaries from ext2fs. I got a panic when I tried this (with a linux binary). I wouldn't think of running programs from a msdos fs, but why not ext2fs? Is this supported, or has anyone else tried running linux or freebsd binaries from an ext2fs partition? --Mark Krentel To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 25 9:22:17 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 14C3C14EA5 for ; Thu, 25 Nov 1999 09:22:03 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id SAA27640; Thu, 25 Nov 1999 18:22:01 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id SAA40090; Thu, 25 Nov 1999 18:22:00 +0100 (MET) Date: Thu, 25 Nov 1999 18:22:00 +0100 From: Eivind Eklund To: Terry Lambert Cc: fs@FreeBSD.ORG Subject: Re: namei() and freeing componentnames Message-ID: <19991125182159.B602@bitbox.follo.net> References: <19991112000359.A256@bitbox.follo.net> <199911241819.LAA19803@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <199911241819.LAA19803@usr08.primenet.com>; from tlambert@primenet.com on Wed, Nov 24, 1999 at 06:19:52PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Nov 24, 1999 at 06:19:52PM +0000, Terry Lambert wrote: > You say that you want it to be reflexive and symmetrical; path > name buffers are allocated by the VFS consumer. To achieve > this goal, they must also be deallocated by the VFS consumer. I have a series of progressive patches towards this goal available at http://www.freebsd.org/~eivind/ None of these are expected to in any way be near working, and I misread the namei() code enough that there are a bunch of VOP_RELEASENDs that need to be removed. Right now, after seeing how much chaos the VOP_RELEASEND stuff turned into and how many places other code is repeated, I'm tempted to go for a NDFREE() which can free struct nameidata, *including vrele/vput'ing aquired vp*, and which takes flags to indicate if it is to leave some resources behind. Fortunately, I now have diffs most of the places where this would be needed, and have worked with the code in those areas recently, so it hopefully won't be that much work to convert the diffs to this model (which would mean that the VOP_RELEASEND that is in those patches disappear). > One of the largest barriers to transaction using VFSs in BSD > at this point is that the VOP_ABORTOP() frees the path name > buffer, and it should not. I've noticed :) In the present patches, I am plain slaying VOP_ABORTOP(), on the basis of it not being used for anything anymore (all it did in all filesystems we have was to free the pathname), and intended to have it re-introduced correctly when/if we get a transactional FS. I was intending to discuss this once I was at a point where patches were actually runnable (along with other decisions I've made while actually hacking the code), though feel free to come with views on it (since you've brought it into the conversation). I'll get back to the rest of your message (and the other one) later; I just wanted to give at least some indication that I am not a black hole. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 25 12: 6: 6 1999 Delivered-To: freebsd-fs@freebsd.org Received: from excalibur.lps.ens.fr (excalibur.lps.ens.fr [129.199.120.3]) by hub.freebsd.org (Postfix) with ESMTP id AD0F414DD1 for ; Thu, 25 Nov 1999 12:05:58 -0800 (PST) (envelope-from Thierry.Besancon@lps.ens.fr) Received: from (besancon@localhost) by excalibur.lps.ens.fr (8.9.3/jtpda-5.3.1) id VAA29616 ; Thu, 25 Nov 1999 21:05:42 +0100 (MET) To: "Mark W. Krentel" Cc: freebsd-fs@FreeBSD.ORG, Thierry.Besancon@lps.ens.fr Subject: Re: running linux binaries from ext2fs partition References: <199911250620.BAA19286@dreamscape.com> From: Thierry.Besancon@lps.ens.fr Date: 25 Nov 1999 21:05:41 +0100 In-Reply-To: "Mark W. Krentel"'s message of Thu, 25 Nov 1999 01:20:16 -0500 (EST) Message-ID: Lines: 47 X-Mailer: Gnus v5.3/Emacs 19.34 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Dixit "Mark W. Krentel" (le Thu, 25 Nov 1999 01:20:16 -0500 (EST)) : >> > Whenever I run an executable residing in the mfs /tmp, it justs hangs >> > the kernel : >> >> I also mount a MFS on /tmp. I tried copying ls, find, emacs onto /tmp >> and ran them from there. Works fine for me in 3.3-stable. But there's >> something odd in your mounts: Do remember that I have no swap available. >> > Filesystem 1K-blocks Used Avail Capacity Mounted on >> > ... >> > mfs:61 3935 1431 2190 40% /var >> > /var/tmp 3935 1431 2190 40% /tmp >> >> You're remounting a subdir of /var onto /tmp? Wouldn't a symlink be >> a better choice here? That is, don't mount /var/tmp onto /tmp. Instead, >> make /tmp a symlink that points to /var/tmp. Try that and see if you >> still get the crashes. Well, I do the way /etc/rc.diskless2 does : ... if [ ! -h /tmp -a ! -h /var/tmp ]; then mount_null /var/tmp /tmp fi ... Sometime, you have to trust someone... I trust FreeBSD guys ;-) I'll give the symlink a try but, anyway, I found a way to make the kernel crash at will. If it crashes, it means it is buggy somewhere and it needs a fix not a workaround... >> But I'm still wondering about running binaries from ext2fs. I got a >> panic when I tried this (with a linux binary). I wouldn't think of >> running programs from a msdos fs, but why not ext2fs? Is this supported, >> or has anyone else tried running linux or freebsd binaries from an >> ext2fs partition? I don't have ext2fs... Thierry Besancon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 25 16:13: 3 1999 Delivered-To: freebsd-fs@freebsd.org Received: from sv01.cet.co.jp (sv01.cet.co.jp [210.171.56.2]) by hub.freebsd.org (Postfix) with ESMTP id 4F9ED14D37; Thu, 25 Nov 1999 16:12:59 -0800 (PST) (envelope-from michaelh@cet.co.jp) Received: from localhost (michaelh@localhost) by sv01.cet.co.jp (8.9.3/8.9.3) with SMTP id AAA03313; Fri, 26 Nov 1999 00:12:57 GMT Date: Fri, 26 Nov 1999 09:12:57 +0900 (JST) From: Michael Hancock To: Eivind Eklund Cc: Terry Lambert , fs@FreeBSD.ORG Subject: Re: namei() and freeing componentnames In-Reply-To: <19991125182159.B602@bitbox.follo.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Right now, after seeing how much chaos the VOP_RELEASEND stuff turned > into and how many places other code is repeated, I'm tempted to go for > a NDFREE() which can free struct nameidata, *including vrele/vput'ing > aquired vp*, and which takes flags to indicate if it is to leave some > resources behind. NDFREE() makes sense, though I'd do the vrele/vput part later as a separate step. Regards, Mike To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 26 3: 5:22 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 4FD5714F7F for ; Fri, 26 Nov 1999 03:05:13 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id MAA10001; Fri, 26 Nov 1999 12:05:11 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id MAA43398; Fri, 26 Nov 1999 12:05:11 +0100 (MET) Date: Fri, 26 Nov 1999 12:05:11 +0100 From: Eivind Eklund To: Michael Hancock Cc: Terry Lambert , fs@FreeBSD.ORG Subject: Re: namei() and freeing componentnames Message-ID: <19991126120511.E602@bitbox.follo.net> References: <19991125182159.B602@bitbox.follo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from michaelh@cet.co.jp on Fri, Nov 26, 1999 at 09:12:57AM +0900 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Nov 26, 1999 at 09:12:57AM +0900, Michael Hancock wrote: > > Right now, after seeing how much chaos the VOP_RELEASEND stuff turned > > into and how many places other code is repeated, I'm tempted to go for > > a NDFREE() which can free struct nameidata, *including vrele/vput'ing > > aquired vp*, and which takes flags to indicate if it is to leave some > > resources behind. > > NDFREE() makes sense, though I'd do the vrele/vput part later as a > separate step. In normal circumstances, I might agree. However, we have a 4.0 architectural changes freeze coming up, and if we are to handle this right, we should have free inhibition flags rather than flags saying what to free (in order to be able to change the definition without changing all callers, and in order to make the code obvious at the point of call). This means that if we do not do it now, we really should wait to get close to 5.0-RELEASE to do this, or we need to sync the change into the 4.0 API after release, in violation of our releases-have-stable-APIs policy. I would like to avoid both of these options. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 26 7:16:28 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 0100715050 for ; Fri, 26 Nov 1999 07:16:19 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id QAA14078; Fri, 26 Nov 1999 16:16:19 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id QAA44301; Fri, 26 Nov 1999 16:16:18 +0100 (MET) Date: Fri, 26 Nov 1999 16:16:18 +0100 From: Eivind Eklund To: Terry Lambert Cc: ezk@cs.columbia.edu, fs@FreeBSD.ORG Subject: Re: namei() and freeing componentnames Message-ID: <19991126161618.B44210@bitbox.follo.net> References: <19991118153220.E45524@bitbox.follo.net> <199911241855.LAA21738@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <199911241855.LAA21738@usr08.primenet.com>; from tlambert@primenet.com on Wed, Nov 24, 1999 at 06:55:04PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Nov 24, 1999 at 06:55:04PM +0000, Terry Lambert wrote: > > Yes, this is the intent. > > > > The problem I'm finding with VOP_RELEASEND() is that namei() can > > return two different vps - the dvp (directory vp) and the actual vp > > (inside the directory dvp points at), and that neither of these are > > always available. > > What gets returned is based on the flags passed down. I think > that trying to encapsulate this transparently, so that any > namei() operation that succeeds or fails can be freed in its > entirety without resort to flags specific code in the caller > is a mistake. I don't think you can reasonably do this. What it presently frees is only the patch component buffeer. > One issue that occurs to me is that namei() itself, and not the > underlying VOP_LOOKUP code, should be the one to reference the > path component name cache. If the underlying VFS doesn't want > the cache hit to occur without notifying it of the event, then > it needs to not enter the data in the cache. This would simplify > a large amount of code. Where? How? I do not quite get this - could you give a few more details or pointers to some code it would modify? > The other simplification, which is organizational, and could, > using inline functions, be effectively NULL additional code > overhead, is to seperate the lookup operations by request > type. Whether or not something wants the parent directory > back has much to do with whther it is a create or rename > operation, and little to do with anything else. Operations > which intend to modify the returned directory entry are very > distinct from those merely doing a lookup. I have thought of it, and have been very tempted to do it. I've not yet tried to find out how much code impact it would have; there are a few namei()'s that are at a different layer than the NDINIT()s, and I've chosen to do the frees at the same layer as the NDINIT() - as that is where how the allocation is done is decided (as namei() is dependent on the flags). > I have often felt that much of the mess create/rename/delete/open > variant behaviour causes should be addressed by moving the > complexity to upper level code. I tend to agree, but I am not certain how easy it will be, nor whether it will end up really clean - I may look at this once I've done the other cleanups. I feel it as less important than the rest. [On changing the detailedness of lock specifications in vnode_if.src, in order to be able to generate proper lock assertions] > Please keep an eye towards not precluding Jermey Allisons work > on a kernel opportunity locking interface, since it's really > needed to do hosted OS/host OS coherency properly (e.g. Samba > clients must obey UNIX locks, and UNIX applications must obey > those of Samba). This is similar to what NFS clients and local > applications must do to interoperate, and is the primary purpose > of the LOASE interface. I must admit to not understanding the lease interface at all. I do not think any of the work I am doing at the moment will impact it; I only deal with vnode locks. > > (2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can, > > and return how much that was" rather than "Eat a single path > > component; we have already decided what this is." > > This allows different types of namespaces, and it allows > > optimizations in VOP_LOOKUP() when several steps in the traversal > > is inside a single filesystem (and hey - who mounts a > > new filesystem on every directory they see, anyway?) > > The path component buffer mechanism already specifies this behaviour > as one of its initial design requirements, so I think this is already > taken care of. > > What does not happen is that lookups that will take place in a > single VFS are not held down in that VFS for the entire traversal, > but instead pop up to namei(). This was what I wanted to get rid of. > I don't think you can get rid of this, without destroying the > "union" option (not the same as the "unionfs"), and without > damaging the ability to cover mount points and to chroot or > do symlink expansion, or deal with POSIX namespace escape. I wanted to do it in order to be able to deal with POSIX namespace escapes, as the logic for how to handle the namespace would be pushed downwards, but I might not have thought all the implications through. I'll admit to working "pseudo-blind" - I do not understand all details and architecture of the code, and try to understand detail by detail as I need to in order to bring things forward. > One "low hanging fruit" optimization that can be made is to > _always_ set the fdp->fd_rdir to the processes current > root directory; this avoids the NULL/non-NULL test, so long > as it is inherited correctly on fork, and set for init. > > This would be very nice for many other reasons... 8-). That's the patches that are on your home page on freefall, right? I've been planning to commit them, I've just not gotten around to it. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 26 7:21:20 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id CB31514C9C for ; Fri, 26 Nov 1999 07:21:08 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id QAA14154; Fri, 26 Nov 1999 16:21:07 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id QAA44337; Fri, 26 Nov 1999 16:21:07 +0100 (MET) Date: Fri, 26 Nov 1999 16:21:07 +0100 From: Eivind Eklund To: Terry Lambert Cc: fs@FreeBSD.ORG Subject: Re: namei() and freeing componentnames Message-ID: <19991126162107.C44210@bitbox.follo.net> References: <19991112000359.A256@bitbox.follo.net> <199911241819.LAA19803@usr08.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <199911241819.LAA19803@usr08.primenet.com>; from tlambert@primenet.com on Wed, Nov 24, 1999 at 06:19:52PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Nov 24, 1999 at 06:19:52PM +0000, Terry Lambert wrote: > The main grossness comes from the use of "goto" statements > and targets in the macro definitions. This can be alleviated > be incorporating the path name free into the "bail out" case, > and preinitializing the path name buffer pointer to NULL so > that it can be tested for validity on a premature exit. I've already done this in my patches :) > I also think that the primary evil of the additional VOP is that > it takes the code further from where it needs to be. The abomination > that is NFS cookies is a result of overloading the VOP_LOOKUP code > in order to obtain directory restart, when the underlying FS's > directory entry block entry (struct dirent) is larger than the > one that you proxy over the wire. > > I think that the correct way to deal with this is to define an > externalization VOP seperate from the VOP_LOOKUP, which will > do the data externalization for you. I do not get this. Could you give a few more details of what change(s) you are thinking of? E.g, a short description of what VOP you want, including what input parameters and output parameters you see for it? Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message