From owner-freebsd-stable@FreeBSD.ORG  Fri Aug  1 11:59:40 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8A89E37B401
	for <freebsd-stable@freebsd.org>;
	Fri,  1 Aug 2003 11:59:40 -0700 (PDT)
Received: from mail.sandvine.com (sandvine.com [199.243.201.138])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C3C6B43FCB
	for <freebsd-stable@freebsd.org>;
	Fri,  1 Aug 2003 11:59:39 -0700 (PDT)
	(envelope-from don@sandvine.com)
Received: by mail.sandvine.com with Internet Mail Service (5.5.2653.19)
	id <305LHFQJ>; Fri, 1 Aug 2003 14:59:39 -0400
Message-ID: <FE045D4D9F7AED4CBFF1B3B813C85337027420EE@mail.sandvine.com>
From: Don Bowman <don@sandvine.com>
To: "'freebsd-stable@freebsd.org'" <freebsd-stable@freebsd.org>
Date: Fri, 1 Aug 2003 14:59:37 -0400 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Subject: RE: kernel deadlock
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Aug 2003 18:59:40 -0000

 
> On Tue, 29 Jul 2003, Don Bowman wrote:
> 
> > From: Don Bowman [mailto:don@sandvine.com]
> > >
> > > From: Robert Watson [mailto:rwatson@freebsd.org]
> > > > On Tue, 29 Jul 2003, Dave Dolson wrote:
> > > >
> > > > > To follow up, I've discovered that the system has
> > > exhausted its "FFS
> > > > > node" malloc type.
> > >  ...
> > > >
> > > > Some problems with this have turned up in -CURRENT on 
> large-memory
> > > > machines where some of the scaling factors have been off.  In
> > >
> > > We currently have kern.maxvnodes=70354 set (automatically
> > > scaled). This
> > > is a 1GB box.
> > >
> > > I will try re-running the test with less.
> > >
> > > when it hits kern.maxvnodes, what will it do?
> >
> > After applying the fixes from RELENG_4 for kern/52425,
> > I can still easily reproduce this hang without low memory.
> > Further debugging shows that vnlru process is waiting on
> > vlrup. This line is shown below. ie vnlru_nowhere is being
> > incremented ever 3 seconds.

So what is happening here is that vnlru wakes up, runs through,
and there is nothing to free, so it goes back to sleep having
freed nothing. The caller doesn't wake up. There's no vnodes 
to free, and everything in the system locks up.

One possible solution is to make vnlru more aggressive, so 
that before giving up, it tries to free pages that have
many references etc (which it currently skips).
Another option is to have it simply bump the kern.maxvnodes
number and wake up the process which called it.

Suggestions?

--don