From owner-freebsd-hackers Wed Oct 25 09:47:25 1995 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id JAA25868 for hackers-outgoing; Wed, 25 Oct 1995 09:47:25 -0700 Received: from eldorado.net-tel.co.uk (eldorado.net-tel.co.uk [193.122.171.253]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id JAA25847 for ; Wed, 25 Oct 1995 09:47:14 -0700 From: Andrew.Gordon@net-tel.co.uk Received: (from root@localhost) by eldorado.net-tel.co.uk (8.6.12/8.6.10) id RAA04194; Wed, 25 Oct 1995 17:46:31 +0100 X400-Received: by mta "eldorado" in "/PRMD=net-tel/ADMD=gold 400/C=gb/"; Relayed; Wed, 25 Oct 95 17:43:11 +0100 X400-Received: by mta "net-tel cambridge" in "/PRMD=net-tel/ADMD=gold 400/C=gb/"; Relayed; Wed, 25 Oct 95 16:43:08 +0000 X400-Received: by "/PRMD=NET-TEL/ADMD=Gold 400/C=GB/"; Relayed; Wed, 25 Oct 95 16:43:08 +0000 X400-MTS-Identifier: ["/PRMD=NET-TEL/ADMD=Gold 400/C=GB/";hst:369-951025164308-2939] X400-Content-Type: P2-1984 (2) X400-Originator: Andrew.Gordon@net-tel.co.uk Original-Encoded-Information-Types: IA5-Text X400-Recipients: non-disclosure:; Date: Wed, 25 Oct 95 16:43:08 +0000 Content-Identifier: Re(2): panic: fr Message-Id: <"MAC-951025164255-1E49*/G=Andrew/S=Gordon/O=NET-TEL Computer Systems Ltd/PRMD=NET-TEL/ADMD=Gold 400/C=GB/"@MHS> To: FREEBSD-HACKERS-L Cc: taob@io.org, David Greenman In-Reply-To: <"SunOS:857-951025122108-11BB*/DD.RFC-822=owner-hackers(a)freebsd.org/O=internet/PRMD=NET-TEL/ADMD=GOLD 400/C=GB/"@MHS> Subject: Re(2): panic: free vnode isn't Sender: owner-hackers@freebsd.org Precedence: bulk > Nope. I switched back to the non-debugging kernel today and it > only ran for about a day before it locked up. Again, no syslog > messages or any indication that something went wrong (except that > everything is frozen). We left the office at 9:20pm for dinner, and > it died at 9:30pm (figures). > ... > too much of a load on the server? The FTP server is chroot'd to a > local directory, but anything beneath ~ftp/pub is NFS-mounted. All > user home directories are also NFS-mounted. This sounds a little like a problem we have been having - which I haven't reported previously as I have not had time to characterise it properly. However, it goes somthing like this... Server is running 2.0.5R (with more recent SCSI drivers from -STABLE a few weeks ago, probably not important). It is configured as both NFS server and client, and also runs SAMBA to serve files to Windows machines. Some of the Windows users mount partitions through SAMBA which are in turn NFS mounted by the server from a third machine. All this worked fine for some months, although with very occasional freezes - usually when one of the DOS machines had been crashed and rebooted, but otherwise inexplicable. More recently, Windows 95 has appeared on the scene, and when a Win95 machine accesses files on one of these SAMBA->NFS mounted partitions, the freeze happens consistently. The nature of the freeze is some kind of deadlock in the filesystem - if you catch it just after the freeze, terminal/telnet sessions are normally still alive, but as soon as they touch the filesystem they also block forever - and things like mail check in the shell etc. means that most processes end up frozen after a few minutes. The problem has completely gone away since we moved all the files used by the Windows users onto local discs on the server, so the problem would appear to lie in the NFS client code. My best guess is that this relates to the fact that SAMBA appears to do non-blocking I/O on files, in order to serve multiple request in parallel from a given client (there is one SAMBA process per client) - presumably Win3.11 never makes multiple requests in parallel apart from the special case of crashing between submitting a request and getting the result, wheras Win95 requests in parallel as a matter of course??? Changing compile options on SAMBA to USE_MMAP=1 appeared to have a beneficial effect, though I can't afford to run these sort of tests on our main fileserver to be sure. [Apologies if this is a known bug or not relevant to the problem in hand - I had been meaning to set up a spare machine with -STABLE and reproduce the problem there before posting about it]. Andrew. andrew.gordon@net-tel.co.uk