From owner-freebsd-fs  Sun Nov 21 11:25:10 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from excalibur.lps.ens.fr (excalibur.lps.ens.fr [129.199.120.3])
	by hub.freebsd.org (Postfix) with ESMTP id 5812F1591A
	for <freebsd-fs@FreeBSD.ORG>; Sun, 21 Nov 1999 11:24:29 -0800 (PST)
	(envelope-from Thierry.Besancon@lps.ens.fr)
Received: from (besancon@localhost)
          by excalibur.lps.ens.fr (8.9.3/jtpda-5.3.1) id UAA16636
          ; Sun, 21 Nov 1999 20:24:17 +0100 (MET)
To: "Mark W. Krentel" <krentel@dreamscape.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: running linux binaries from ext2fs partition
References: <199911202017.PAA03794@dreamscape.com>
Cc: besancon@lps.ens.fr
From: Thierry.Besancon@lps.ens.fr
Date: 21 Nov 1999 20:24:15 +0100
In-Reply-To: "Mark W. Krentel"'s message of Sat, 20 Nov 1999 15:17:58 -0500 (EST)
Message-ID: <wnn7ljblkbk.fsf@excalibur.lps.ens.fr>
Lines: 85
X-Mailer: Gnus v5.3/Emacs 19.34
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Dixit "Mark W. Krentel" <krentel@dreamscape.com> (le Sat, 20 Nov 1999 15:17:58 -0500 (EST)) :

>> 
>> Is it possible to run linux (or freebsd) binaries directly from a
>> local ext2fs partition?
>> 
>> ...
>> 
>> While we're on the subject, on what filesystem types is it ok to run
>> binaries?  Local freebsd (UFS), NFS, and cdrom should all work, right?
>> Are there others?
>> 


	Hello 

I don't know the answer to the last question but here's what I found.

I setup X terminals using FreeBSD 3.3-RELEASE.

/tmp is a MFS :

Filesystem                 1K-blocks     Used    Avail Capacity  Mounted on
129.199.120.250:/             127023    31651    85211    27%    /
mfs:29                           959      668      215    76%    /conf/etc
/conf/etc                        959      668      215    76%    /etc
129.199.120.250:/usr          190543   153042    22258    87%    /usr
129.199.120.250:/usr/local   2846396  1958786   659899    75%    /usr/local
mfs:61                          3935     1431     2190    40%    /var
/var/tmp                        3935     1431     2190    40%    /tmp
mfs:91                          1511       47     1344     3%    /dev

The X terminal runs without any swap.
/etc/rc.sysctl confirms it as well :
	sysctl -w vm.swap_enabled=0


Whenever I run an executable residing in the mfs /tmp, it justs hangs
the kernel :

# cp /bin/ls /tmp
# df /tmp/.
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/var/tmp         3935     1432     2189    40%    /tmp
# /tmp/ls
(workstation freezes)

Here's the panic :

Fatal trap 12 : page fault while in kernel mode
fault virtual address = 0x3e
fault code            = supervisor read, page not present
instruction pointer   = 0x8:0xc022bf14
stack pointer         = 0x10:0xc4546bc8
frame pointer         = 0x10:0xc4546ca4
code segment          = base 0x0, list 0xfffff, type 0x1b
                      = DPL 0, pres 1, def32 1, gran 1
precessor eflags      = interrupt disabled, resume, IOPL = 0
current process       = 355 (csh)
interrupt mask        = net tty bio cam
kernel : type 12 trap, code = 0
Stopped at ffs_vptofh+0xfe0: cmpw $0x2,0x3e(%edx)

and the trace :

db> trace
ffs_vptofh(c4546d5c,c4514300,1000,0,c4546cf4) at ffs_vptofh+0xfe0
end(c4546d5c) at 0xc087c485
vnode_pager_freepage(c4559a2c,c4546db8,1,0,c4546df8) at vnode_pager_freepage+0x556
vm_pager_get_pages(c4559a2c,c4546db8,1,0,c4546f18) at vm_pager_get_pages+0x1f
exec_map_first_page(c4546e94,c44c55a8,c02fe464,0,4) at exec_map_first_page+0xba
execve(c44c55a0,c4546f94,80922e0,80940000,8085000) at execve+0x19e
syscall(27,27,8085000,8094000,bfbffbb0) at syscall+0x187
Xint0x80_syscall() at Xint0x80_syscall+0x2c

(not too deep)

Given I have no swap (vm.swap_enabled=0), it is not easy to supply
vmcore.  But I can provide any help as I can reproduce the crash at
will.

If someone has a clue on how to fix that...

	Thierry


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov 24 10:21:19 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2E342152E8; Wed, 24 Nov 1999 10:21:07 -0800 (PST)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.3/8.9.3) id LAA21327;
	Wed, 24 Nov 1999 11:19:54 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp05.primenet.com, id smtpdAAASQaazP; Wed Nov 24 11:19:35 1999
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id LAA19803;
	Wed, 24 Nov 1999 11:19:52 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911241819.LAA19803@usr08.primenet.com>
Subject: Re: namei() and freeing componentnames
To: eivind@FreeBSD.ORG (Eivind Eklund)
Date: Wed, 24 Nov 1999 18:19:52 +0000 (GMT)
Cc: fs@FreeBSD.ORG
In-Reply-To: <19991112000359.A256@bitbox.follo.net> from "Eivind Eklund" at Nov 12, 99 00:03:59 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> I would like to make this reflexive - "symmetrical" allocation and
> free, like it presently is supposed to be with SAVESTART (but isn't -
> there are approximately one billion bugs in the code).
> 
> I suspect that for some filesystems (though none of the present ones),
> it might be necessary to do more than a
> zfree(namei_zone,cnp->cn_pnbuf) in order to free up all the relevant
> data.  In order to support this, we'd have to introduce a new VOP -
> tentatively called VOP_RELEASEND().  Unfortunately, this comes with a
> performance penalty.


A VOP_RELEASEND() call is a bad idea.

The path name buffers should be considered an opaque resource by
the underlying filesystem.

One can think of the path name buffers as containing three parts:

1)	Allocated information which may be referenced by a VFS,
	but not deallocated or otherwise modified.

2)	Context-free statites.  This is state information which
	is present in the structure, and can be modified by a VFS
	according to globally applicable rules.

3)	Contextual statites.  This is state information which is
	present in the structure, and can be modified by a VFS
	according to contract with upper level code.

Currently, there are not VFSs which support, require, or use
contextual statetites.  Such things will probably be necessary
to support multiple simultaneous name spaces which are not
lazy-bound (e.g. supporting the 8.3 and long name name spaces
for newly created files in a VFAT32FS or NTFS), but this is a
special case for which other FreeBSD support is currently
missing anyway.

I would delay the introduction of a VOP dealing with path
name buffers until such time as contextual statites that
require VFS based allocation of arbitrary structure data
become necessary.  Even then, it may be only necessary to
realize two additional structure elements: one that has a
void pointer, and one that has the memory pool from which
the data referenced by a non-NULL void pointer was allocated
(one wonders why a pointer can not be asked to which pool
it belongs, so that pool identity is not required on free).

A common technique used in such cases is to allocated the
data pointed to by an allocated structure contiguous to the
structure (e.g. in the same allocation), and have the internal
structure pointer elements point into memory following the
structure.  This allows the pointer to be freed opaquely,
with all concommitant allocations, e.g.:

	struct foo {
		char	*string;
		...
	};

	struct foo *p;

	p = malloc( sizeof(struct foo) + strlen(str) + 1);
	p->string = ((char *)p) + sizeof(struct foo);
	strcpy( p->string, str);

	...

	free( p);


You say that you want it to be reflexive and symmetrical; path
name buffers are allocated by the VFS consumer.  To achieve
this goal, they must also be deallocated by the VFS consumer.

One of the largest barriers to transaction using VFSs in BSD
at this point is that the VOP_ABORTOP() frees the path name
buffer, and it should not.


> It also allows an evil hack:
> The NFS code is rather incestuous with the VFS system, in order to
> minimize the amount of cached data during NFS requests.

It is, like the system call layer, a consumer of the VFS.  It is
not NFS' fault that the system call layer has historically been
treated as a "more equal pig" when it comes to consuming the VFS.

I am well aware of the path name buffer switch that occurs in the
NFS server.  The simple answer is "caller frees".  One the path
name buffer allocation and deallocation has been rationalized,
the NFS code becomes much simpler: as a consumer of the VFS
interface, it allocates and deallocates the path name buffers
that it utilized, just like any other VFS consumer.

The main grossness comes from the use of "goto" statements
and targets in the macro definitions.  This can be alleviated
be incorporating the path name free into the "bail out" case,
and preinitializing the path name buffer pointer to NULL so
that it can be tested for validity on a premature exit.


> One side of
> this is that it seems to throw away the vnode we'd like to use for
> VOP_RELEASEND() - before it wants to throw away the componentname.

Yes.

If you examine the vop_lookup.c code, you will see that it
avoids this by hiding the act in a mutual function recursion;
this is the same one that it uses to do symlink expansion in
pace in the path name buffer to avoid having to allocate more
buffer space, and to avoid exceeding the  1024 byte path length
limit on the allocated path name buffer.


> Is it too evil?  I'm of two minds - I don't like messing more than
> necessary with the NFS code (and isn't sure I could do the messing
> without performance impact), but I'm not exactly ecstatic about the
> hack, either.

It's too evil, from a lot of perspectives.  I think that the
per-VFS lookup private resource release is a premature feature
creep, and it's probably not justified, when a relatively opaque
(or opaque, if the memory pool identity didn't need to be cached)
pointer could take its place.


I believe the NFS code could be handled without a performance
impact; there are already path component name buffers being
allocated and deallocated in the cases you are worried about,
they're just not being allocated and deallocated symmetrically.


I also think that the primary evil of the additional VOP is that
it takes the code further from where it needs to be.  The abomination
that is NFS cookies is a result of overloading the VOP_LOOKUP code
in order to obtain directory restart, when the underlying FS's
directory entry block entry (struct dirent) is larger than the
one that you proxy over the wire.

I think that the correct way to deal with this is to define an
externalization VOP seperate from the VOP_LOOKUP, which will
do the data externalization for you.

This would have the side effect of NFS-izing all future FSs,
since the same code could be used both by NFS and the system
call layer.  Currently, the system call layer does not do
the "cookie dance", and so that code is relatively unmaintained.
If all VFS consumers consumed the same code path, the code in
the path would be maintained.

Anyway, that's my two cents...


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov 24 11: 4:49 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132])
	by hub.freebsd.org (Postfix) with ESMTP
	id 7AF86153E8; Wed, 24 Nov 1999 11:04:09 -0800 (PST)
	(envelope-from tlambert@usr08.primenet.com)
Received: (from daemon@localhost)
	by smtp02.primenet.com (8.8.8/8.8.8) id MAA03761;
	Wed, 24 Nov 1999 12:03:15 -0700 (MST)
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp02.primenet.com, id smtpd003665; Wed Nov 24 12:03:07 1999
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id LAA21738;
	Wed, 24 Nov 1999 11:55:04 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911241855.LAA21738@usr08.primenet.com>
Subject: Re: namei() and freeing componentnames
To: eivind@FreeBSD.ORG (Eivind Eklund)
Date: Wed, 24 Nov 1999 18:55:04 +0000 (GMT)
Cc: ezk@cs.columbia.edu, fs@FreeBSD.ORG
In-Reply-To: <19991118153220.E45524@bitbox.follo.net> from "Eivind Eklund" at Nov 18, 99 03:32:20 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> Yes, this is the intent.
> 
> The problem I'm finding with VOP_RELEASEND() is that namei() can
> return two different vps - the dvp (directory vp) and the actual vp
> (inside the directory dvp points at), and that neither of these are
> always available.

What gets returned is based on the flags passed down.  I think
that trying to encapsulate this transparently, so that any
namei() operation that succeeds or fails can be freed in its
entirety without resort to flags specific code in the caller
is a mistake.  I don't think you can reasonably do this.

One issue that occurs to me is that namei() itself, and not the
underlying VOP_LOOKUP code, should be the one to reference the
path component name cache.  If the underlying VFS doesn't want
the cache hit to occur without notifying it of the event, then
it needs to not enter the data in the cache.  This would simplify
a large amount of code.

The other simplification, which is organizational, and could,
using inline functions, be effectively NULL additional code
overhead, is to seperate the lookup operations by request
type.  Whether or not something wants the parent directory
back has much to do with whther it is a create or rename
operation, and little to do with anything else.  Operations
which intend to modify the returned directory entry are very
distinct from those merely doing a lookup.

I have often felt that much of the mess create/rename/delete/open
variant behaviour causes should be addressed by moving the
complexity to upper level code.


> Progress report: Based on current rate of progress, it looks like I'll
> be able to have patches ready for (my personal) testing sunday (or
> *possibly* saturday, but most likely not).  Depending on how
> testing/debugging works out, the patches will most likely be ready for
> public testing sometime next week.  I'll need help with NFS testing.

Heh.  This is the same stumbling block I hit, needing help with
NFS testing.  I created, and I believe it was Peter who updated
it, a testing framework that can detect kernel memory leaks from
user space, and which exercised the entire branch path for the
namei()/nameifree() cases.  This would probably be a good thing
for someone to use, since it will identify the branch path in
which any memory leaks are occurring.

> Forward view: I'm undecided on the next step.  Possibilities:
>
> (1) Change the way locking is specificied to make it feasible to test
>     locking patches properly, and change the assertion generation to
>     generate better assertions.  This will probably require changing
>     VOP_ISLOCKED() to be able to take a process parameter, and return
>     different valued based on wether an exlusive lock is held by that
>     process or by another process.  The present behaviour will be
>     available by passing NULL for this parameter.
> 
>     Presently, running multiple processes does not work properly, as
>     the assertions do not really assert the right things.
> 
>     These changes are necessary to properly debug the use of locks,
>     which I again believe is necessary for stacking layers (which I
>     would like to work in 4.0, but I don't know if I will be able to
>     have ready).

This would be nice; I still believe most of the vnode and the
advisory locking code can move to upper layers.  I think it is
the responsibility of the stacking layers to propagate locks,
and the only place that this is really an issue is on fan-in
or fan-out.

Please keep an eye towards not precluding Jermey Allisons work
on a kernel opportunity locking interface, since it's really
needed to do hosted OS/host OS coherency properly (e.g. Samba
clients must obey UNIX locks, and UNIX applications must obey
those of Samba).  This is similar to what NFS clients and local
applications must do to interoperate, and is the primary purpose
of the LOASE interface.


> (2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can,
>     and return how much that was" rather than "Eat a single path
>     component; we have already decided what this is."
>     This allows different types of namespaces, and it allows
>     optimizations in VOP_LOOKUP() when several steps in the traversal
>     is inside a single filesystem (and hey - who mounts a
>     new filesystem on every directory they see, anyway?)

The path component buffer mechanism already specifies this behaviour
as one of its initial design requirements, so I think this is already
taken care of.

What does not happen is that lookups that will take place in a
single VFS are not held down in that VFS for the entire traversal,
but instead pop up to namei().

I don't think you can get rid of this, without destroying the
"union" option (not the same as the "unionfs"), and without
damaging the ability to cover mount points and to chroot or
do symlink expansion, or deal with POSIX namespace escape.

The original reason for allowing this behaviour at all, according
to Heidemann's thesis, is to permit an underlying FS to "eat as
much as you want", as opposed to "eat as much as you can".  This
was used in proxy VFS stacking layers, since a proxy layer knows
that it owns the entire tree inferior to the current component.


One "low hanging fruit" optimization that can be made is to
_always_ set the fdp->fd_rdir to the processes current
root directory; this avoids the NULL/non-NULL test, so long
as it is inherited correctly on fork, and set for init.

This would be very nice for many other reasons... 8-).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov 24 22:22:24 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from europa.dreamscape.com (europa.dreamscape.com [206.64.128.147])
	by hub.freebsd.org (Postfix) with ESMTP id 11E5A14CEC
	for <freebsd-fs@FreeBSD.ORG>; Wed, 24 Nov 1999 22:22:17 -0800 (PST)
	(envelope-from krentel@dreamscape.com)
Received: from dreamscape.com (sA19-p21.dreamscape.com [209.217.200.84])
          by europa.dreamscape.com (8.8.5/8.8.4) with ESMTP
	  id BAA27780; Thu, 25 Nov 1999 01:21:57 -0500 (EST)
X-Dreamscape-Track-A: sA19-p21.dreamscape.com [209.217.200.84]
X-Dreamscape-Track-B: Thu, 25 Nov 1999 01:21:57 -0500 (EST)
Received: (from krentel@localhost)
	by dreamscape.com (8.9.3/8.9.3) id BAA19286;
	Thu, 25 Nov 1999 01:20:16 -0500 (EST)
	(envelope-from krentel)
Date: Thu, 25 Nov 1999 01:20:16 -0500 (EST)
From: "Mark W. Krentel" <krentel@dreamscape.com>
Message-Id: <199911250620.BAA19286@dreamscape.com>
To: freebsd-fs@FreeBSD.ORG, Thierry.Besancon@lps.ens.fr
Subject: Re: running linux binaries from ext2fs partition
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Thierry Besancon writes:
> Whenever I run an executable residing in the mfs /tmp, it justs hangs
> the kernel :

I also mount a MFS on /tmp.  I tried copying ls, find, emacs onto /tmp
and ran them from there.  Works fine for me in 3.3-stable.  But there's
something odd in your mounts:

> Filesystem                 1K-blocks     Used    Avail Capacity  Mounted on
> ...
> mfs:61                          3935     1431     2190    40%    /var
> /var/tmp                        3935     1431     2190    40%    /tmp

You're remounting a subdir of /var onto /tmp?  Wouldn't a symlink be
a better choice here?  That is, don't mount /var/tmp onto /tmp.  Instead,
make /tmp a symlink that points to /var/tmp.  Try that and see if you
still get the crashes.

But I'm still wondering about running binaries from ext2fs.  I got a
panic when I tried this (with a linux binary).  I wouldn't think of 
running programs from a msdos fs, but why not ext2fs?  Is this supported,
or has anyone else tried running linux or freebsd binaries from an
ext2fs partition?

--Mark Krentel


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov 25  9:22:17 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 14C3C14EA5
	for <fs@FreeBSD.ORG>; Thu, 25 Nov 1999 09:22:03 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id SAA27640;
	Thu, 25 Nov 1999 18:22:01 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id SAA40090;
	Thu, 25 Nov 1999 18:22:00 +0100 (MET)
Date: Thu, 25 Nov 1999 18:22:00 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Terry Lambert <tlambert@primenet.com>
Cc: fs@FreeBSD.ORG
Subject: Re: namei() and freeing componentnames
Message-ID: <19991125182159.B602@bitbox.follo.net>
References: <19991112000359.A256@bitbox.follo.net> <199911241819.LAA19803@usr08.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <199911241819.LAA19803@usr08.primenet.com>; from tlambert@primenet.com on Wed, Nov 24, 1999 at 06:19:52PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, Nov 24, 1999 at 06:19:52PM +0000, Terry Lambert wrote:
> You say that you want it to be reflexive and symmetrical; path
> name buffers are allocated by the VFS consumer.  To achieve
> this goal, they must also be deallocated by the VFS consumer.

I have a series of progressive patches towards this goal available at
http://www.freebsd.org/~eivind/

None of these are expected to in any way be near working, and I
misread the namei() code enough that there are a bunch of
VOP_RELEASENDs that need to be removed.

Right now, after seeing how much chaos the VOP_RELEASEND stuff turned
into and how many places other code is repeated, I'm tempted to go for
a NDFREE() which can free struct nameidata, *including vrele/vput'ing
aquired vp*, and which takes flags to indicate if it is to leave some
resources behind.

Fortunately, I now have diffs most of the places where this would be
needed, and have worked with the code in those areas recently, so it
hopefully won't be that much work to convert the diffs to this model
(which would mean that the VOP_RELEASEND that is in those patches
disappear).

> One of the largest barriers to transaction using VFSs in BSD
> at this point is that the VOP_ABORTOP() frees the path name
> buffer, and it should not.

I've noticed :) In the present patches, I am plain slaying
VOP_ABORTOP(), on the basis of it not being used for anything anymore
(all it did in all filesystems we have was to free the pathname), and
intended to have it re-introduced correctly when/if we get a
transactional FS.  I was intending to discuss this once I was at a
point where patches were actually runnable (along with other decisions
I've made while actually hacking the code), though feel free to come
with views on it (since you've brought it into the conversation).

I'll get back to the rest of your message (and the other one) later; I
just wanted to give at least some indication that I am not a black hole.

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov 25 12: 6: 6 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from excalibur.lps.ens.fr (excalibur.lps.ens.fr [129.199.120.3])
	by hub.freebsd.org (Postfix) with ESMTP id AD0F414DD1
	for <freebsd-fs@FreeBSD.ORG>; Thu, 25 Nov 1999 12:05:58 -0800 (PST)
	(envelope-from Thierry.Besancon@lps.ens.fr)
Received: from (besancon@localhost)
          by excalibur.lps.ens.fr (8.9.3/jtpda-5.3.1) id VAA29616
          ; Thu, 25 Nov 1999 21:05:42 +0100 (MET)
To: "Mark W. Krentel" <krentel@dreamscape.com>
Cc: freebsd-fs@FreeBSD.ORG, Thierry.Besancon@lps.ens.fr
Subject: Re: running linux binaries from ext2fs partition
References: <199911250620.BAA19286@dreamscape.com>
From: Thierry.Besancon@lps.ens.fr
Date: 25 Nov 1999 21:05:41 +0100
In-Reply-To: "Mark W. Krentel"'s message of Thu, 25 Nov 1999 01:20:16 -0500 (EST)
Message-ID: <wnnzow2fiay.fsf@excalibur.lps.ens.fr>
Lines: 47
X-Mailer: Gnus v5.3/Emacs 19.34
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Dixit "Mark W. Krentel" <krentel@dreamscape.com> (le Thu, 25 Nov 1999 01:20:16 -0500 (EST)) :

>> > Whenever I run an executable residing in the mfs /tmp, it justs hangs
>> > the kernel :
>> 
>> I also mount a MFS on /tmp.  I tried copying ls, find, emacs onto /tmp
>> and ran them from there.  Works fine for me in 3.3-stable.  But there's
>> something odd in your mounts:

Do remember that I have no swap available.


>> > Filesystem                 1K-blocks     Used    Avail Capacity  Mounted on
>> > ...
>> > mfs:61                          3935     1431     2190    40%    /var
>> > /var/tmp                        3935     1431     2190    40%    /tmp
>> 
>> You're remounting a subdir of /var onto /tmp?  Wouldn't a symlink be
>> a better choice here?  That is, don't mount /var/tmp onto /tmp.  Instead,
>> make /tmp a symlink that points to /var/tmp.  Try that and see if you
>> still get the crashes.

Well, I do the way /etc/rc.diskless2 does :

        ...
        if [ ! -h /tmp -a ! -h /var/tmp ]; then
                mount_null /var/tmp /tmp
        fi
        ...

Sometime, you have to trust someone...
I trust FreeBSD guys ;-)

I'll give the symlink a try but, anyway, I found a way to make the
kernel crash at will. If it crashes, it means it is buggy somewhere
and it needs a fix not a workaround...


>> But I'm still wondering about running binaries from ext2fs.  I got a
>> panic when I tried this (with a linux binary).  I wouldn't think of 
>> running programs from a msdos fs, but why not ext2fs?  Is this supported,
>> or has anyone else tried running linux or freebsd binaries from an
>> ext2fs partition?

I don't have ext2fs...

        Thierry Besancon


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov 25 16:13: 3 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from sv01.cet.co.jp (sv01.cet.co.jp [210.171.56.2])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4F9ED14D37; Thu, 25 Nov 1999 16:12:59 -0800 (PST)
	(envelope-from michaelh@cet.co.jp)
Received: from localhost (michaelh@localhost)
	by sv01.cet.co.jp (8.9.3/8.9.3) with SMTP id AAA03313;
	Fri, 26 Nov 1999 00:12:57 GMT
Date: Fri, 26 Nov 1999 09:12:57 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: Eivind Eklund <eivind@FreeBSD.ORG>
Cc: Terry Lambert <tlambert@primenet.com>, fs@FreeBSD.ORG
Subject: Re: namei() and freeing componentnames
In-Reply-To: <19991125182159.B602@bitbox.follo.net>
Message-ID: <Pine.BSF.3.95LJ1.1b3.991126091024.3272A-100000@sv01.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> Right now, after seeing how much chaos the VOP_RELEASEND stuff turned
> into and how many places other code is repeated, I'm tempted to go for
> a NDFREE() which can free struct nameidata, *including vrele/vput'ing
> aquired vp*, and which takes flags to indicate if it is to leave some
> resources behind.

NDFREE() makes sense, though I'd do the vrele/vput part later as a
separate step. 

Regards,


Mike


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov 26  3: 5:22 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 4FD5714F7F
	for <fs@FreeBSD.ORG>; Fri, 26 Nov 1999 03:05:13 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id MAA10001;
	Fri, 26 Nov 1999 12:05:11 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id MAA43398;
	Fri, 26 Nov 1999 12:05:11 +0100 (MET)
Date: Fri, 26 Nov 1999 12:05:11 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Michael Hancock <michaelh@cet.co.jp>
Cc: Terry Lambert <tlambert@primenet.com>, fs@FreeBSD.ORG
Subject: Re: namei() and freeing componentnames
Message-ID: <19991126120511.E602@bitbox.follo.net>
References: <19991125182159.B602@bitbox.follo.net> <Pine.BSF.3.95LJ1.1b3.991126091024.3272A-100000@sv01.cet.co.jp>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <Pine.BSF.3.95LJ1.1b3.991126091024.3272A-100000@sv01.cet.co.jp>; from michaelh@cet.co.jp on Fri, Nov 26, 1999 at 09:12:57AM +0900
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, Nov 26, 1999 at 09:12:57AM +0900, Michael Hancock wrote:
> > Right now, after seeing how much chaos the VOP_RELEASEND stuff turned
> > into and how many places other code is repeated, I'm tempted to go for
> > a NDFREE() which can free struct nameidata, *including vrele/vput'ing
> > aquired vp*, and which takes flags to indicate if it is to leave some
> > resources behind.
> 
> NDFREE() makes sense, though I'd do the vrele/vput part later as a
> separate step. 

In normal circumstances, I might agree.  However, we have a 4.0
architectural changes freeze coming up, and if we are to handle this
right, we should have free inhibition flags rather than flags saying
what to free (in order to be able to change the definition without
changing all callers, and in order to make the code obvious at the
point of call).

This means that if we do not do it now, we really should wait to get
close to 5.0-RELEASE to do this, or we need to sync the change into
the 4.0 API after release, in violation of our
releases-have-stable-APIs policy.  I would like to avoid both of these
options.

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov 26  7:16:28 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 0100715050
	for <fs@FreeBSD.ORG>; Fri, 26 Nov 1999 07:16:19 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id QAA14078;
	Fri, 26 Nov 1999 16:16:19 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id QAA44301;
	Fri, 26 Nov 1999 16:16:18 +0100 (MET)
Date: Fri, 26 Nov 1999 16:16:18 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Terry Lambert <tlambert@primenet.com>
Cc: ezk@cs.columbia.edu, fs@FreeBSD.ORG
Subject: Re: namei() and freeing componentnames
Message-ID: <19991126161618.B44210@bitbox.follo.net>
References: <19991118153220.E45524@bitbox.follo.net> <199911241855.LAA21738@usr08.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <199911241855.LAA21738@usr08.primenet.com>; from tlambert@primenet.com on Wed, Nov 24, 1999 at 06:55:04PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, Nov 24, 1999 at 06:55:04PM +0000, Terry Lambert wrote:
> > Yes, this is the intent.
> > 
> > The problem I'm finding with VOP_RELEASEND() is that namei() can
> > return two different vps - the dvp (directory vp) and the actual vp
> > (inside the directory dvp points at), and that neither of these are
> > always available.
> 
> What gets returned is based on the flags passed down.  I think
> that trying to encapsulate this transparently, so that any
> namei() operation that succeeds or fails can be freed in its
> entirety without resort to flags specific code in the caller
> is a mistake.  I don't think you can reasonably do this.

What it presently frees is only the patch component buffeer.

> One issue that occurs to me is that namei() itself, and not the
> underlying VOP_LOOKUP code, should be the one to reference the
> path component name cache.  If the underlying VFS doesn't want
> the cache hit to occur without notifying it of the event, then
> it needs to not enter the data in the cache.  This would simplify
> a large amount of code.

Where?  How?  I do not quite get this - could you give a few more
details or pointers to some code it would modify?

> The other simplification, which is organizational, and could,
> using inline functions, be effectively NULL additional code
> overhead, is to seperate the lookup operations by request
> type.  Whether or not something wants the parent directory
> back has much to do with whther it is a create or rename
> operation, and little to do with anything else.  Operations
> which intend to modify the returned directory entry are very
> distinct from those merely doing a lookup.

I have thought of it, and have been very tempted to do it.  I've not
yet tried to find out how much code impact it would have; there are a
few namei()'s that are at a different layer than the NDINIT()s, and
I've chosen to do the frees at the same layer as the NDINIT() - as
that is where how the allocation is done is decided (as namei() is
dependent on the flags).

> I have often felt that much of the mess create/rename/delete/open
> variant behaviour causes should be addressed by moving the
> complexity to upper level code.

I tend to agree, but I am not certain how easy it will be, nor whether
it will end up really clean - I may look at this once I've done the
other cleanups.  I feel it as less important than the rest.


[On changing the detailedness of lock specifications in vnode_if.src,
in order to be able to generate proper lock assertions]

> Please keep an eye towards not precluding Jermey Allisons work
> on a kernel opportunity locking interface, since it's really
> needed to do hosted OS/host OS coherency properly (e.g. Samba
> clients must obey UNIX locks, and UNIX applications must obey
> those of Samba).  This is similar to what NFS clients and local
> applications must do to interoperate, and is the primary purpose
> of the LOASE interface.

I must admit to not understanding the lease interface at all.  I do
not think any of the work I am doing at the moment will impact it; I
only deal with vnode locks.

> > (2) Change the behaviour of VOP_LOOKUP() to "eat as much as you can,
> >     and return how much that was" rather than "Eat a single path
> >     component; we have already decided what this is."
> >     This allows different types of namespaces, and it allows
> >     optimizations in VOP_LOOKUP() when several steps in the traversal
> >     is inside a single filesystem (and hey - who mounts a
> >     new filesystem on every directory they see, anyway?)
> 
> The path component buffer mechanism already specifies this behaviour
> as one of its initial design requirements, so I think this is already
> taken care of.
> 
> What does not happen is that lookups that will take place in a
> single VFS are not held down in that VFS for the entire traversal,
> but instead pop up to namei().

This was what I wanted to get rid of.

> I don't think you can get rid of this, without destroying the
> "union" option (not the same as the "unionfs"), and without
> damaging the ability to cover mount points and to chroot or
> do symlink expansion, or deal with POSIX namespace escape.

I wanted to do it in order to be able to deal with POSIX namespace
escapes, as the logic for how to handle the namespace would be pushed
downwards, but I might not have thought all the implications through.
I'll admit to working "pseudo-blind" - I do not understand all details
and architecture of the code, and try to understand detail by detail
as I need to in order to bring things forward.

> One "low hanging fruit" optimization that can be made is to
> _always_ set the fdp->fd_rdir to the processes current
> root directory; this avoids the NULL/non-NULL test, so long
> as it is inherited correctly on fork, and set for init.
> 
> This would be very nice for many other reasons... 8-).

That's the patches that are on your home page on freefall, right?
I've been planning to commit them, I've just not gotten around to it.

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov 26  7:21:20 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id CB31514C9C
	for <fs@FreeBSD.ORG>; Fri, 26 Nov 1999 07:21:08 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id QAA14154;
	Fri, 26 Nov 1999 16:21:07 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id QAA44337;
	Fri, 26 Nov 1999 16:21:07 +0100 (MET)
Date: Fri, 26 Nov 1999 16:21:07 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Terry Lambert <tlambert@primenet.com>
Cc: fs@FreeBSD.ORG
Subject: Re: namei() and freeing componentnames
Message-ID: <19991126162107.C44210@bitbox.follo.net>
References: <19991112000359.A256@bitbox.follo.net> <199911241819.LAA19803@usr08.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <199911241819.LAA19803@usr08.primenet.com>; from tlambert@primenet.com on Wed, Nov 24, 1999 at 06:19:52PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, Nov 24, 1999 at 06:19:52PM +0000, Terry Lambert wrote:
> The main grossness comes from the use of "goto" statements
> and targets in the macro definitions.  This can be alleviated
> be incorporating the path name free into the "bail out" case,
> and preinitializing the path name buffer pointer to NULL so
> that it can be tested for validity on a premature exit.

I've already done this in my patches :)

> I also think that the primary evil of the additional VOP is that
> it takes the code further from where it needs to be.  The abomination
> that is NFS cookies is a result of overloading the VOP_LOOKUP code
> in order to obtain directory restart, when the underlying FS's
> directory entry block entry (struct dirent) is larger than the
> one that you proxy over the wire.
> 
> I think that the correct way to deal with this is to define an
> externalization VOP seperate from the VOP_LOOKUP, which will
> do the data externalization for you.

I do not get this.  Could you give a few more details of what
change(s) you are thinking of?  E.g, a short description of what VOP
you want, including what input parameters and output parameters you
see for it?

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message