From owner-freebsd-current Mon May 12 12:14:57 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id MAA28972 for current-outgoing; Mon, 12 May 1997 12:14:57 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id MAA28966; Mon, 12 May 1997 12:14:55 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id MAA08042; Mon, 12 May 1997 12:08:31 -0700 From: Terry Lambert Message-Id: <199705121908.MAA08042@phaeton.artisoft.com> Subject: Re: PATCHES: NFS server locking support To: Andrew.Gordon@net-tel.co.uk Date: Mon, 12 May 1997 12:08:31 -0700 (MST) Cc: terry@lambert.org, hackers@FreeBSD.ORG, current@FreeBSD.ORG In-Reply-To: <"710-970512105050-E04B*/G=Andrew/S=Gordon/O=NET-TEL Computer from "Andrew.Gordon@net-tel.co.uk" at May 12, 97 10:49:36 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-current@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Ah, Andrew! I had hoped to drag you back into this. 8-). > > Notes: * F_CNVT requires a covenant between the NFS lockd in > > user space, and the kernel, based on the wire > > representation of NFS file handles propagated to > > the lockd process. Because I don't know what this > > is from the incomplete user space rpc.lockd code, > > this function is stubbed to return ENOSYS. Once > > this information is documented, it will be a simple > > matter to call FHTOVP on the user's behalf. > > It's not documented in the spec either! There is a bit of clarification > in the NFS V3 spec (which sadly documents only the delta from old > locking to new locking, rather than re-specifying the whole thing): Maybe I should clarify: I knew you knew the answer based on the work you had done, but because you stubbed things, the fields didn't get broken out, so *I* didn't know the answer. Unfortunately, I don't have a SunOS machine I can use as a client to decode this myself. > I found, last time I looked at this, that the encoding for NFS v2 was > fairly obvious when you looked at a trace of existing implementations > talking to each other; however, I don't have a pair of third-party > NFS v3 implementations available to check. Yes; I did this same thing while at Novell/USG (the former USL), but I don't remember the data, and wasn't allowed to take it with me when I left. > > Note that POSIX close semantics regarding advisory > > locking are antagonistic to an NFS lockd at this > > time. I have not written a POSIX namespace override > > option for open or for fcntl() at this time. This > > means the user space NFS lockd will not be able to > > coelesce open fd's, and must lazy-close them based > > on stat information. This will severely restrict > > the number of simultaneous locking clients that can > > ne active at one time until these semantic overrides > > go in. > > I don't see why the POSIX close semantics are a problem here - I would > expect the lockd to hold only one open fd for each file handle (with the > owner/pid fields in the lock distinguishing individual clients). > Of course, the limit on open fds per process is potentially limiting > on a single-process lockd, but there is no obvious way round this. I expected that each handle that came in from a remote system would be potentially unique to that system. Basically this means that the user space NFS lockd calls F_CNVT to convert the handle to an fd, and this may result in an fd for a file which is already open. I expected to fstat() the fd, and then use the device/inode value to hash it and uniquify the fd. This means closing the duplicate fd, and that's where the POSIX semantics can bite you: close the duplicate fd, and you lose the locks on the fd it duplicates if the fd obeys POSIX close semantics. If you can guaranteee that you can hash the handle values in user space because the handle values are not unique in the conversion part between client systems, then you're all set... and it's not a problem. F_CNVT will only be called once per hash miss in that case. > > ** The F_UNLKSYS function operates on a single process > > open file table. This means that you can not have > > peer-based load balancing of NFS lockd clients. This > > could be rewritten to travers the system open file > > table instead of the per process open file table. If > > this were done, the restriction would be lifted. I > > am personally more interested in a multithreaded NFS > > lockd instead of a multiple instances of an NFS lockd, > > so I have not done this. > > Is this how you plan to handle blocking locks? The one thing that you > don't appear to have provided is a mechanism for waking up the lockd when > a previously unavailable lock becomes free (so that lockd can inform the > client). If the lockd is multi-threaded to the extent that it can afford > to have one of its threads go into a blocking fcntl() call for each > outstanding lock, then this requirement goes away - but that assumes > kernel threads, and also presents a problem for implementation of the > nlm_cancel from the client (which cancels a previous blocking lock request). No, this is the "client has crashed" case, where you deassert all the locks for a given client system. The call just says "for this client system, deassert all locks regardless of the process on that system". This is the reasoning behind the "lf_pid == 0" case: it's a wildcard value for "any pid for the give lf_rsys". I expected to handle the blocking locks using one of three methods: 1) The semantic override. Because the locks are asserted using (struct flock).l_rsys != RSYS_LOCAL (0), we could decide to generate select() events for the fd's on which there were outstanding locks. 2) The async call gate. Ideally, all potentially blocking system calls should be usable using an async vs. sync trap mechanism that cretes an aio context record for the call. This is actually the ideally method of implementing a Unversity of Washington style user space threading system, either for POSIX user space threads, or for SunOS 4.x liblwp Light Weight Processes. An async call may be polled and timed out using aiowait() and aiocancel(), respectively. 3) The poll(2) interface. This interface allows for events other than the read/write/except events allowed to select; there were a number of people in the core team who were talking about integrating the poll(2) code from NetBSD as the basis for the select(2) call. A "lock event" could deal with this. I suppose you *could* do a thread per client, but without #2 to support conversion of a blocking call into a non-blocking call plus a thread context switch, I don't see where this would be very useful given the current implementation of threads being so trivial. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.