From owner-freebsd-hackers Fri Sep 12 15:31:48 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id PAA14604 for hackers-outgoing; Fri, 12 Sep 1997 15:31:48 -0700 (PDT) Received: from pluto.plutotech.com (root@mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id PAA14598 for ; Fri, 12 Sep 1997 15:31:44 -0700 (PDT) Received: from shane.plutotech.com (shane.plutotech.com [206.168.67.149]) by pluto.plutotech.com (8.8.5/8.8.5) with ESMTP id QAA00374 for ; Fri, 12 Sep 1997 16:31:43 -0600 (MDT) Message-Id: <199709122231.QAA00374@pluto.plutotech.com> From: "Mike Durian" To: hackers@freebsd.org Subject: VFS/NFS client wedging problem Date: Fri, 12 Sep 1997 16:31:43 -0600 Sender: owner-freebsd-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've got a VFS problem I'm hoping someone out there can give me some ideas on. I've written a VFS based filesystem that is an interface to our RAID system. The RAID system stores video frames and the filesystem allows access to the data and automatically translates the data to a variety of file formats (TIFF, Targa, YUV, etc.). The frame number and conversion type are defined by the path name. Eg /pfs/frames/tiff/0.tiff or /pfs/HMSF/tga/hour00/minute01/second10//00.01.10.29.tga. The filesystem is implemented partially in the kernel and partially as a user application. The two parts communicate via a socket. The filesystem works well for normal accesses, but I'm having a strange problem with NFS. I've supplied the fhtovp and vptofh hooks and things basically work, but I can get the client side wedged under heavy accesses. If I run four simultaneous processes copying data to my filesystem, after a while I'll see one of the nfsiod go to sleep on "vfsfsy" and not return. Eventually, the other nfsiods will go to sleep on "nfsrcv" and that's that. In both cases, it looks like the clients aren't getting a acks from the server. Strangely, none of the nfsd processes on the server are sleeping and the user mount_pfs process isn't sleeping either. In fact the filesystem is still perfectly usable. It's just the NFS client that is wedged. I'm not sure where the problem lies. Is it an NFS issue or a (more likely) bug in my filesystem? Does anybody have any ideas on why an NFS server might drop an ACK and wedge the client? I haven't been able to find any paths through the NFS code that would lead to this condition, but then to me the NFS code is like a maze of twisty passages, all alike. I did get the same results with both NFSv3 and NFSv2. TCP failed too, though lasted longer with a number of "server not responing"/"server responding again" messages. The messages would appear back to back without the server really going down. Thanks, mike