From owner-freebsd-hackers  Fri Dec 29 03:18:05 1995
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id DAA14725
          for hackers-outgoing; Fri, 29 Dec 1995 03:18:05 -0800 (PST)
Received: from cls.net (freeside.cls.de [192.129.50.1])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id DAA14720
          for <hackers@freebsd.org>; Fri, 29 Dec 1995 03:18:01 -0800 (PST)
Received: by mail.cls.net (Smail3.1.29.1)
	  from allegro.lemis.de (192.109.197.134) with smtp
	  id <m0tVcom-0011k1C@cls.net>; Fri, 29 Dec 95 11:17 GMT
From: grog@lemis.de (Greg Lehey)
Organisation: LEMIS, Schellnhausen 2, 36325 Feldatal, Germany
Phone: +49-6637-919123
Fax:   +49-6637-919122
Reply-To: grog@lemis.de (Greg Lehey)
Received: (grog@localhost) by allegro.lemis.de (8.6.9/8.6.9) 
	id MAA05014 for hackers@freebsd.org; Fri, 29 Dec 1995 12:02:29 +0100
Message-Id: <199512291102.MAA05014@allegro.lemis.de>
Subject: Memory leak in -current NFS code?
To: hackers@freebsd.org (FreeBSD Hackers)
Date: Fri, 29 Dec 1995 12:02:27 +0100 (MET)
X-Mailer: ELM [version 2.4 PL23]
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Sender: owner-hackers@freebsd.org
Precedence: bulk

In the past couple of days, I've had three cases of the network
hanging with the message "No buffer space available".  In each cases,
I've had to reboot the system.  If anybody can help me debug this
problem, I'd be grateful.

Here's what happens:

A week or so ago, my source disk died on me, and I have cross-mounted
the sources on allegro, a BSD/386 1.1 box.  During the night, I run a
number of cron jobs:

- On freebie, the FreeBSD box, I have a job which extracts the cvs
  updates and rebuilds the software.

- On allegro, the BSD/386 box, I have a cleanup job which does a
  backup of all the network.  

Until the day before yesterday, everything worked well.  I had a
separate backup on freebie because the additional disk space made it
impossible to get everything on one disk.  The day before yesterday I
added freebie to the list of the network backups.  When I got up
yesterday morning, I found that freebie had hung itself up with buffer
space problems.  The backup (tar cf - / | rsh allegro dd of=$TAPE) had
not written any significant quantity of data to tape (tar t on allegro
showed no data, just an error), and the rebuild was hanging on a
network request.  On the console I had the message

Dec 28 04:47:37 freebie /kernel.std: nfs send error 55 for server allegro.lemis.de:/home

Today exactly the same thing happened, as far as I can tell at the
same time.  I used ddb to take a dump (BTW, what's the correct way to
do that?  There doesn't seem to be an instruction to do this, so I
trashed callfree).  On dumping, I had a significant number of
unflushed buffers, which may have something to do with the problem.

During the morning, while I was still trying to rebuild -current, it
happened *again*, this time without any help from the backup
routines.  allegro is quite busy continuing with its interrupted
cleanup, and I suspect that the problem might be dropped packets which
don't get cleaned up.  FWIW, freebie:/ and freebie:/cdrom are
cross-mounted on allegro, and I can't unmount them because allegro
claims they're busy--I think this is a BSD/386 NFS problem which
occurs after any timeout on an NFS request.

If anybody could give me a few pointers in the code or data
structures, I'd be grateful.  I know my way round kernel code pretty
well, but I don't know much about the network implementation.

Greg