Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 07 Mar 2008 07:43:46 +0100
From:      Tomas Olsson <tol@stacken.kth.se>
To:        Alec Kloss <alec-dated-1205290157.d7dd21@SetFilePointer.com>
Cc:        afs@FreeBSD.org, arla-drinkers@stacken.kth.se, Robert Watson <rwatson@FreeBSD.org>, Garance A Drosehn <gad@FreeBSD.org>, Rasmus Kaj <kaj@kth.se>
Subject:   Re: arla-devel port for FreeBSD (was: Patches to get Arla running on FreeBSD 8-CURRENT)
Message-ID:  <1204872226.4059.15.camel@hippo.t.nxs.se>
In-Reply-To: <20080307024916.GC1911@hamlet.SetFilePointer.com>
References:  <20080223102922.GF38141@hamlet.setfilepointer.com> <20080223110549.GG38141@hamlet.setfilepointer.com> <20080223161249.GH38141@hamlet.setfilepointer.com> <90334B40754BEDC2991E0147@ganymede.hub.org> <p0624081bc3e936674ece@[128.113.24.47]> <20080226061140.GI28956@hamlet.SetFilePointer.com> <20080301210055.GA8919@hamlet.SetFilePointer.com> <20080302161258.L21146@fledge.watson.org> <1204477663.4180.36.camel@hippo.t.nxs.se> <20080303045554.GC8919@hamlet.SetFilePointer.com> <20080307024916.GC1911@hamlet.SetFilePointer.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 2008-03-06 at 20:49 -0600, Alec Kloss wrote:
> Anyway, Tomas, or others, do you have any hints for me about how
> best to start diagnosing and maybe fixing issues?  The most
> repeatable way I've found to get bad behavior is to rsync -a
> /usr/src and /usr/obj into AFS.  After 30 seconds or so of this,
> I'll start getting messages like these:
> 
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
>  lockmgr: thread 0xc6970840 unlocking unheld lock
> 
> on the console.  Eventually, rsync will block and generally things
> will decay.  Overnight, I'm going to script the console while
> attempting this with nnpfsdeb almost-all set.  This is, of course,
> a lot slower than arla normally runs, but I'm hoping someone may be
> able to see the source of the trouble.  I'll post the console
> somewhere tomorrow.  
> 
> Anyway, any hints about debugging arla would be welcome.
> 
Some random thoughts:
 * If you don't have it yet, get a debug kernel with full vfs sanity
checking etc.
 * Set a breakpoint (or panic) at the lockmgr printf and inspect stack
trace and other live threads.
 * See if you can run into similar problems using arla's tests, if
you're lucky there will be a faster way to trigger it.
 * Perhaps you can cut down on almost-all. Not sure how much. Of course,
there's always the risk that timing changes with nnpfsdebug on.
 * try arlad --tracefile=foo.trace (in the cache dir) and cat it to
nnpfs/readtrace.py to decipher it when you're done. It's fast and gives
a complete log of arlad-nnpfs communication.

Hope this helps
		/t




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1204872226.4059.15.camel>