From owner-freebsd-current@FreeBSD.ORG Wed Jun 18 21:39:33 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 19CB137B401 for ; Wed, 18 Jun 2003 21:39:33 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3EC1E43FAF for ; Wed, 18 Jun 2003 21:39:30 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h5J4dKM7056345 for ; Wed, 18 Jun 2003 21:39:25 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200306190439.h5J4dKM7056345@gw.catspoiler.org> Date: Wed, 18 Jun 2003 21:39:20 -0700 (PDT) From: Don Lewis To: current@FreeBSD.org In-Reply-To: <200306180832.h5I8WHM7054008@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Subject: Re: fun with WITNESS and "pool mutex" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jun 2003 04:39:33 -0000 On 18 Jun, I wrote: > When I was attempting to debug a system deadlock problem where the > culprit process was sleeping on a "pool mutex", I noticed that "show > witness" in ddb doesn't report anything about this particular mutex > flavor. I discovered that witness doesn't monitor these mutexes because > mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag. > > These are a mutexes bit special, because they are supposed to be leaf > mutexes and no other mutexes should be grabbed after them. The deadlock > in question caused me to discover a violation of this restriction, so I > wondered if there were more problems of this type in the code. I > suspected there would be, since there haven't been any automatic checks > of to verify that these mutexes are being used correctly. > > Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup() > and quickly found the first violation during the boot sequence: [ snip - I committed a patch ] > Any bets on how many other potential deadlock problems there are in the > tree? The only problems I've found so far are in fdrop_locked() and kern_open(), so things might not be as bleak as I initially feared. I also got this LOR message from witness about the sx lock code: lock order reversal 1st 0xc05e1020 pool mutex (pool mutex) @ /usr/src/sys/kern/kern_sx.c:111 2nd 0xc05dfa00 module subsystem sx lock (module subsystem sx lock) @ /usr/src/s ys/kern/kern_module.c:126 I *think* this is actually a safe use of pool mutex. What would be the best way to quite the complaint? The two possibilities that I can think of are to handle this as a special case in the witness code or to slightly rearrange the code in sx_lock.c to swap the order of the WITNESS_LOCK() and mtx_unlock() calls in _sx*_lock().