From owner-freebsd-current@FreeBSD.ORG Sun Dec 26 18:17:41 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5A1D916A4CE for ; Sun, 26 Dec 2004 18:17:41 +0000 (GMT) Received: from stephanie.unixdaemons.com (stephanie.unixdaemons.com [67.18.111.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id EAD8943D39 for ; Sun, 26 Dec 2004 18:17:40 +0000 (GMT) (envelope-from bmilekic@technokratis.com) Received: from stephanie.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1])iBQIHcCw023379; Sun, 26 Dec 2004 13:17:38 -0500 (EST) Received: (from bmilekic@localhost) by stephanie.unixdaemons.com (8.13.2/8.12.1/Submit) id iBQIHcuW023377; Sun, 26 Dec 2004 13:17:38 -0500 (EST) (envelope-from bmilekic@technokratis.com) X-Authentication-Warning: stephanie.unixdaemons.com: bmilekic set sender to bmilekic@technokratis.com using -f Date: Sun, 26 Dec 2004 13:17:38 -0500 From: Bosko Milekic To: Peter Holm Message-ID: <20041226181738.GA21533@technokratis.com> References: <20041209144233.GA46928@peter.osted.lan> <20041220234103.GA59225@technokratis.com> <20041222210553.GA28108@peter.osted.lan> <20041222221540.GA70052@technokratis.com> <20041226161153.GA74592@peter.osted.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20041226161153.GA74592@peter.osted.lan> User-Agent: Mutt/1.4.2.1i cc: current@freebsd.org Subject: Re: panic: uma_zone_slab is looping X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Dec 2004 18:17:41 -0000 On Sun, Dec 26, 2004 at 05:11:53PM +0100, Peter Holm wrote: > > Yes, I think that I have verified your exelent analysis of the > problem: http://www.holm.cc/stress/log/freeze04.html > > So, do have any fix suggenstons? :-) Not yet, because the problem is non-obvious from the trace. I need to know exactly when the UMA RCntSlabs zone recurses _first_, and I need to confirm that it is an actual recursion. I've looked at the VM code and I don't see how/why recursion on the RCntSlabs zone would happen. Please modify the printf code to look exactly like this: if (keg->uk_flags & UMA_ZFLAG_INTERNAL && keg->uk_recurse != 0) { if ((zone == slabzone) || (zone == slabrefzone)) panic("Zone %s forced to fail due to recurse non-null: %d\n", zone->uz_name, keg->uk_recurse); return (NULL); } (You don't need to check any global counter -- the counter is imperfect anyway -- because even a single recursion on slabzone or slabrefzone should be illegal). I'd like to see the trace from the above panic, if possible. Also, from your current crash dump, see if you can print the value of keg->uk_recurse (from frame 11, pid 74804). It appears that the other KASSERT being triggered from propagate_priority() is due to some weird side-effect of process 74804 looping with the UMA RCntSlabs zone lock held (without it ever being dropped). We'll have to see. The point is: the trace is useless unless it shows where/when the recursion on slabrefzone _begins_ to happen (not that it has already happened, that part is obvious now). Happy holidays, -- Bosko Milekic bmilekic@technokratis.com bmilekic@FreeBSD.org