From owner-freebsd-hackers Fri Sep 12 20:26:48 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id UAA29540 for hackers-outgoing; Fri, 12 Sep 1997 20:26:48 -0700 (PDT) Received: from usr02.primenet.com (tlambert@usr02.primenet.com [206.165.6.202]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id UAA29523; Fri, 12 Sep 1997 20:26:44 -0700 (PDT) Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id UAA07078; Fri, 12 Sep 1997 20:26:25 -0700 (MST) From: Terry Lambert Message-Id: <199709130326.UAA07078@usr02.primenet.com> Subject: Re: VFS/NFS client wedging problem To: durian@plutotech.com (Mike Durian) Date: Sat, 13 Sep 1997 03:26:24 +0000 (GMT) Cc: tlambert@primenet.com, hackers@FreeBSD.ORG, fs@FreeBSD.ORG In-Reply-To: <199709130119.TAA03040@pluto.plutotech.com> from "Mike Durian" at Sep 12, 97 07:19:31 pm X-Mailer: ELM [version 2.4 PL23] Content-Type: text Sender: owner-freebsd-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > >If you don't have seperate contexts, eventually you'll make a request > >before the previous one completes. > > I serialize. This is what I figured you had to do, or you'd really be in trouble. > I used to keep a list of N available sockets and > use one socket per request, but since I handle commands atomically > in the user process figured it was silly and dropped down to one > socket. If this is a UNIX domain socket, then it's like a pipe. A pipe does not guarantee to keep data together over the pipe block size, so if you are doing larger writes, this gould be your problem. You could write: AAAAABBBBBCCCCC And get the data out of order: AAAABABBBCBCCCC Which would account for the failures. Typically, when I do this, I write data as: aAaAaAaAaAbBbBbBbBbBcCcCcCcCcC where a, b, and c are channel identification tokens. Then you can decode: aAaAaAaAbBaAbBbBbBcCbBcCcCcCcC Back into atomic units. The channel identifiers are per byte. This is only one possibility, and depends on the write buffer size. > The user process is one big select loop, and doesn't > call select again until it has completed all commands on the > readable sockets (which is now just one socket). Did this failure occur when you had seperate sockets? How hard would it be to go back to a socket per channel as a test case? > >The NFS export stuff is a bit problematic. I don't know what to > >say about it, except that it should be in the common mount code > >instead of being duplicated per FS. > > > >If you can give more architectural data about your FS, and you can > >give the FS you used as a model of how a VFS should be written, I > >might be able to give you more detailed help. > > > >This is probably something that should be taken off the general > >-hackers list, and onto fs@freebsd.org > > It's really a mish-mash of other file systems. I grabbed some > from cd9660 and msdosfs for NFS, socket stuff from portal and > then nullfs and other miscfs filesystem for general stuff. This is not going to be a pleasent revelation, I'm afraid. These are the worst places to get NFS and VOP_LOCK examples, unfortunately. The best place is the ffs/ufs two layer stack, but it's very complicated and hard to understand. The directory stuff in the msdosfs, particularly, is bad. There is a race window after unlocking the parent to locking the child of the child. This is pretty much unavoidable (at present) because of the VOP_LOOKUP code structure pushing some things better left up top down into the per FS code (the msdosfs would be able to deal with it if it didn't have the VOP_ABORTOP issues on create and rename to contend with). > I'll take all the detailed stuff off this list and move it to > freebsd-fs. I didn't know the fs list existed. Heh. Most people don't. It doesn't see much action because it requires huge code shifts to modify interfaces. Anything that needs to do that touches every FS at the same time. > >That's not strange. It's a request context that's wedged. When a > >request context would be slept, the nfsd on the server isn't slept, > >the context is. The nfsd provides an execution context for a different > >request context at that point. Try nfsstat instead, and/or iostat, > >on the server. > > I didn't realize that. I did use nfsstat, but didn't know what > to look for. The only thing that seemed interesting to me was > the 190 server faults. But I didn't know if that was normal or not. I have 0 here, but then my stuff is pretty hacked up compared to the standard distribution, so I have no way of kowing if faults are the normal state of affairs or not. Doug Rabson would know. > >This proves to us that it isn't async requests over the wire that are > >hosing you. That the server is an NFSv3 capable server argues that > >the v2 protocol is implemented by a v3 engine, which would explain > >the blockages. > > > >Have you tried bot TCP and UDP based mounts? > > Yes. UDP died locked up faster than TCP (though that is a subjective > measurement, I didn't actually time things). TCP had the "server not > responding"/"responding again" messages. This lets out "source host not equal to mount host" errors. It's a good data point for eliminating an obvious case... even negative data is still data. 8-(. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.