From owner-freebsd-hackers Fri Dec 29 03:18:05 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id DAA14725 for hackers-outgoing; Fri, 29 Dec 1995 03:18:05 -0800 (PST) Received: from cls.net (freeside.cls.de [192.129.50.1]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id DAA14720 for ; Fri, 29 Dec 1995 03:18:01 -0800 (PST) Received: by mail.cls.net (Smail3.1.29.1) from allegro.lemis.de (192.109.197.134) with smtp id ; Fri, 29 Dec 95 11:17 GMT From: grog@lemis.de (Greg Lehey) Organisation: LEMIS, Schellnhausen 2, 36325 Feldatal, Germany Phone: +49-6637-919123 Fax: +49-6637-919122 Reply-To: grog@lemis.de (Greg Lehey) Received: (grog@localhost) by allegro.lemis.de (8.6.9/8.6.9) id MAA05014 for hackers@freebsd.org; Fri, 29 Dec 1995 12:02:29 +0100 Message-Id: <199512291102.MAA05014@allegro.lemis.de> Subject: Memory leak in -current NFS code? To: hackers@freebsd.org (FreeBSD Hackers) Date: Fri, 29 Dec 1995 12:02:27 +0100 (MET) X-Mailer: ELM [version 2.4 PL23] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-hackers@freebsd.org Precedence: bulk In the past couple of days, I've had three cases of the network hanging with the message "No buffer space available". In each cases, I've had to reboot the system. If anybody can help me debug this problem, I'd be grateful. Here's what happens: A week or so ago, my source disk died on me, and I have cross-mounted the sources on allegro, a BSD/386 1.1 box. During the night, I run a number of cron jobs: - On freebie, the FreeBSD box, I have a job which extracts the cvs updates and rebuilds the software. - On allegro, the BSD/386 box, I have a cleanup job which does a backup of all the network. Until the day before yesterday, everything worked well. I had a separate backup on freebie because the additional disk space made it impossible to get everything on one disk. The day before yesterday I added freebie to the list of the network backups. When I got up yesterday morning, I found that freebie had hung itself up with buffer space problems. The backup (tar cf - / | rsh allegro dd of=$TAPE) had not written any significant quantity of data to tape (tar t on allegro showed no data, just an error), and the rebuild was hanging on a network request. On the console I had the message Dec 28 04:47:37 freebie /kernel.std: nfs send error 55 for server allegro.lemis.de:/home Today exactly the same thing happened, as far as I can tell at the same time. I used ddb to take a dump (BTW, what's the correct way to do that? There doesn't seem to be an instruction to do this, so I trashed callfree). On dumping, I had a significant number of unflushed buffers, which may have something to do with the problem. During the morning, while I was still trying to rebuild -current, it happened *again*, this time without any help from the backup routines. allegro is quite busy continuing with its interrupted cleanup, and I suspect that the problem might be dropped packets which don't get cleaned up. FWIW, freebie:/ and freebie:/cdrom are cross-mounted on allegro, and I can't unmount them because allegro claims they're busy--I think this is a BSD/386 NFS problem which occurs after any timeout on an NFS request. If anybody could give me a few pointers in the code or data structures, I'd be grateful. I know my way round kernel code pretty well, but I don't know much about the network implementation. Greg