From owner-freebsd-afs@FreeBSD.ORG Fri Mar 7 06:45:16 2008 Return-Path: Delivered-To: afs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D45131065673; Fri, 7 Mar 2008 06:45:16 +0000 (UTC) (envelope-from tol@stacken.kth.se) Received: from smtp-1.sys.kth.se (smtp-1.sys.kth.se [130.237.32.175]) by mx1.freebsd.org (Postfix) with ESMTP id 5A8748FC13; Fri, 7 Mar 2008 06:45:15 +0000 (UTC) (envelope-from tol@stacken.kth.se) Received: from localhost (localhost [127.0.0.1]) by smtp-1.sys.kth.se (Postfix) with ESMTP id 817FA155BA4; Fri, 7 Mar 2008 07:44:44 +0100 (CET) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-1.sys.kth.se ([127.0.0.1]) by localhost (smtp-1.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Id5hftbbD+yx; Fri, 7 Mar 2008 07:44:41 +0100 (CET) Received: from [213.89.100.105] (c213-89-100-105.bredband.comhem.se [213.89.100.105]) by smtp-1.sys.kth.se (Postfix) with ESMTP id D93F2155B8D; Fri, 7 Mar 2008 07:44:40 +0100 (CET) From: Tomas Olsson To: Alec Kloss In-Reply-To: <20080307024916.GC1911@hamlet.SetFilePointer.com> References: <20080223102922.GF38141@hamlet.setfilepointer.com> <20080223110549.GG38141@hamlet.setfilepointer.com> <20080223161249.GH38141@hamlet.setfilepointer.com> <90334B40754BEDC2991E0147@ganymede.hub.org> <20080226061140.GI28956@hamlet.SetFilePointer.com> <20080301210055.GA8919@hamlet.SetFilePointer.com> <20080302161258.L21146@fledge.watson.org> <1204477663.4180.36.camel@hippo.t.nxs.se> <20080303045554.GC8919@hamlet.SetFilePointer.com> <20080307024916.GC1911@hamlet.SetFilePointer.com> Content-Type: text/plain Date: Fri, 07 Mar 2008 07:43:46 +0100 Message-Id: <1204872226.4059.15.camel@hippo.t.nxs.se> Mime-Version: 1.0 X-Mailer: Evolution 2.12.0 Content-Transfer-Encoding: 7bit Cc: afs@FreeBSD.org, arla-drinkers@stacken.kth.se, Robert Watson , Garance A Drosehn , Rasmus Kaj Subject: Re: arla-devel port for FreeBSD (was: Patches to get Arla running on FreeBSD 8-CURRENT) X-BeenThere: freebsd-afs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: The Andrew File System and FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Mar 2008 06:45:17 -0000 On Thu, 2008-03-06 at 20:49 -0600, Alec Kloss wrote: > Anyway, Tomas, or others, do you have any hints for me about how > best to start diagnosing and maybe fixing issues? The most > repeatable way I've found to get bad behavior is to rsync -a > /usr/src and /usr/obj into AFS. After 30 seconds or so of this, > I'll start getting messages like these: > > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > lockmgr: thread 0xc6970840 unlocking unheld lock > > on the console. Eventually, rsync will block and generally things > will decay. Overnight, I'm going to script the console while > attempting this with nnpfsdeb almost-all set. This is, of course, > a lot slower than arla normally runs, but I'm hoping someone may be > able to see the source of the trouble. I'll post the console > somewhere tomorrow. > > Anyway, any hints about debugging arla would be welcome. > Some random thoughts: * If you don't have it yet, get a debug kernel with full vfs sanity checking etc. * Set a breakpoint (or panic) at the lockmgr printf and inspect stack trace and other live threads. * See if you can run into similar problems using arla's tests, if you're lucky there will be a faster way to trigger it. * Perhaps you can cut down on almost-all. Not sure how much. Of course, there's always the risk that timing changes with nnpfsdebug on. * try arlad --tracefile=foo.trace (in the cache dir) and cat it to nnpfs/readtrace.py to decipher it when you're done. It's fast and gives a complete log of arlad-nnpfs communication. Hope this helps /t