From owner-freebsd-hackers  Fri Sep 12 15:31:48 1997
Return-Path: <owner-freebsd-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id PAA14604
          for hackers-outgoing; Fri, 12 Sep 1997 15:31:48 -0700 (PDT)
Received: from pluto.plutotech.com (root@mail.plutotech.com [206.168.67.137])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id PAA14598
          for <hackers@freebsd.org>; Fri, 12 Sep 1997 15:31:44 -0700 (PDT)
Received: from shane.plutotech.com (shane.plutotech.com [206.168.67.149])
	by pluto.plutotech.com (8.8.5/8.8.5) with ESMTP id QAA00374
	for <hackers@freebsd.org>; Fri, 12 Sep 1997 16:31:43 -0600 (MDT)
Message-Id: <199709122231.QAA00374@pluto.plutotech.com>
From: "Mike Durian" <durian@plutotech.com>
To: hackers@freebsd.org
Subject: VFS/NFS client wedging problem
Date: Fri, 12 Sep 1997 16:31:43 -0600
Sender: owner-freebsd-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

  I've got a VFS problem I'm hoping someone out there can give
me some ideas on.  I've written a VFS based filesystem that is
an interface to our RAID system.  The RAID system stores video
frames and the filesystem allows access to the data and automatically
translates the data to a variety of file formats (TIFF, Targa, YUV,
etc.).  The frame number and conversion type are defined by the
path name.  Eg /pfs/frames/tiff/0.tiff or
/pfs/HMSF/tga/hour00/minute01/second10//00.01.10.29.tga.
  The filesystem is implemented partially in the kernel and partially
as a user application.  The two parts communicate via a socket.
  The filesystem works well for normal accesses, but I'm having
a strange problem with NFS.  I've supplied the fhtovp and vptofh
hooks and things basically work, but I can get the client side
wedged under heavy accesses.
  If I run four simultaneous processes copying data to my filesystem,
after a while I'll see one of the nfsiod go to sleep on "vfsfsy"
and not return.  Eventually, the other nfsiods will go to sleep on
"nfsrcv" and that's that.
  In both cases, it looks like the clients aren't getting a acks
from the server.  Strangely, none of the nfsd processes on the
server are sleeping and the user mount_pfs process isn't sleeping
either.  In fact the filesystem is still perfectly usable.  It's
just the NFS client that is wedged.
  I'm not sure where the problem lies.  Is it an NFS issue or
a (more likely) bug in my filesystem?  Does anybody have any ideas
on why an NFS server might drop an ACK and wedge the client?  I
haven't been able to find any paths through the NFS code that would
lead to this condition, but then to me the NFS code is like a
maze of twisty passages, all alike.
  I did get the same results with both NFSv3 and NFSv2.  TCP failed
too, though lasted longer with a number of "server not responing"/"server
responding again" messages.  The messages would appear back to back
without the server really going down.

Thanks,
mike