From owner-freebsd-current  Mon May 12 12:14:57 1997
Return-Path: <owner-current>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id MAA28972
          for current-outgoing; Mon, 12 May 1997 12:14:57 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id MAA28966;
          Mon, 12 May 1997 12:14:55 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id MAA08042; Mon, 12 May 1997 12:08:31 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199705121908.MAA08042@phaeton.artisoft.com>
Subject: Re: PATCHES: NFS server locking support
To: Andrew.Gordon@net-tel.co.uk
Date: Mon, 12 May 1997 12:08:31 -0700 (MST)
Cc: terry@lambert.org, hackers@FreeBSD.ORG, current@FreeBSD.ORG
In-Reply-To: <"710-970512105050-E04B*/G=Andrew/S=Gordon/O=NET-TEL Computer from "Andrew.Gordon@net-tel.co.uk" at May 12, 97 10:49:36 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-current@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Ah, Andrew!  I had hoped to drag you back into this.  8-).


> > Notes:  *       F_CNVT requires a covenant between the NFS lockd in
> >                 user space, and the kernel, based on the wire
> >                 representation of NFS file handles propagated to
> >                 the lockd process.  Because I don't know what this
> >                 is from the incomplete user space rpc.lockd code,
> >                 this function is stubbed to return ENOSYS.  Once
> >                 this information is documented, it will be a simple
> >                 matter to call FHTOVP on the user's behalf.
> 
> It's not documented in the spec either!  There is a bit of clarification
> in the NFS V3 spec (which sadly documents only the delta from old
> locking to new locking, rather than re-specifying the whole thing):

Maybe I should clarify: I knew you knew the answer based on the
work you had done, but because you stubbed things, the fields didn't
get broken out, so *I* didn't know the answer.  Unfortunately, I don't
have a SunOS machine I can use as a client to decode this myself.

> I found, last time I looked at this, that the encoding for NFS v2 was
> fairly obvious when you looked at a trace of existing implementations
> talking to each other; however, I don't have a pair of third-party
> NFS v3 implementations available to check. 

Yes; I did this same thing while at Novell/USG (the former USL), but
I don't remember the data, and wasn't allowed to take it with me when
I left.


> >                 Note that POSIX close semantics regarding advisory
> >                 locking are antagonistic to an NFS lockd at this
> >                 time.  I have not written a POSIX namespace override
> >                 option for open or for fcntl() at this time.  This
> >                 means the user space NFS lockd will not be able to
> >                 coelesce open fd's, and must lazy-close them based
> >                 on stat information.  This will severely restrict
> >                 the number of simultaneous locking clients that can
> >                 ne active at one time until these semantic overrides
> >                 go in.
> 
> I don't see why the POSIX close semantics are a problem here - I would
> expect the lockd to hold only one open fd for each file handle (with the
> owner/pid fields in the lock distinguishing individual clients).
> Of course, the limit on open fds per process is potentially limiting
> on a single-process lockd, but there is no obvious way round this.

I expected that each handle that came in from a remote system would be
potentially unique to that system.

Basically this means that the user space NFS lockd calls F_CNVT to
convert the handle to an fd, and this may result in an fd for a file
which is already open.

I expected to fstat() the fd, and then use the device/inode value to
hash it and uniquify the fd.  This means closing the duplicate fd,
and that's where the POSIX semantics can bite you: close the duplicate
fd, and you lose the locks on the fd it duplicates if the fd obeys POSIX
close semantics.

If you can guaranteee that you can hash the handle values in user space
because the handle values are not unique in the conversion part between
client systems, then you're all set... and it's not a problem.  F_CNVT
will only be called once per hash miss in that case.


> >         **      The F_UNLKSYS function operates on a single process
> >                 open file table.  This means that you can not have
> >                 peer-based load balancing of NFS lockd clients.  This
> >                 could be rewritten to travers the system open file
> >                 table instead of the per process open file table.  If
> >                 this were done, the restriction would be lifted.  I
> >                 am personally more interested in a multithreaded NFS
> >                 lockd instead of a multiple instances of an NFS lockd,
> >                 so I have not done this.
> 
> Is this how you plan to handle blocking locks?  The one thing that you
> don't appear to have provided is a mechanism for waking up the lockd when
> a previously unavailable lock becomes free (so that lockd can inform the
> client).  If the lockd is multi-threaded to the extent that it can afford
> to have one of its threads go into a blocking fcntl() call for each
> outstanding lock, then this requirement goes away - but that assumes
> kernel threads, and also presents a problem for implementation of the
> nlm_cancel from the client (which cancels a previous blocking lock request). 

No, this is the "client has crashed" case, where you deassert all the
locks for a given client system.  The call just says "for this client
system, deassert all locks regardless of the process on that system".
This is the reasoning behind the "lf_pid == 0" case: it's a wildcard
value for "any pid for the give lf_rsys".


I expected to handle the blocking locks using one of three methods:

1)	The semantic override.  Because the locks are asserted
	using (struct flock).l_rsys != RSYS_LOCAL (0), we could
	decide to generate select() events for the fd's on which
	there were outstanding locks.

2)	The async call gate.  Ideally, all potentially blocking
	system calls should be usable using an async vs. sync trap
	mechanism that cretes an aio context record for the call.
	This is actually the ideally method of implementing a
	Unversity of Washington style user space threading system,
	either for POSIX user space threads, or for SunOS 4.x liblwp
	Light Weight Processes.  An async call may be polled and
	timed out using aiowait() and aiocancel(), respectively.

3)	The poll(2) interface.  This interface allows for events
	other than the read/write/except events allowed to select;
	there were a number of people in the core team who were
	talking about integrating the poll(2) code from NetBSD as
	the basis for the select(2) call.  A "lock event" could deal
	with this.

I suppose you *could* do a thread per client, but without #2 to support
conversion of a blocking call into a non-blocking call plus a thread
context switch, I don't see where this would be very useful given
the current implementation of threads being so trivial.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.