Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Mar 1997 11:07:27 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        ponds!rivers@dg-rtp.dg.com (Thomas David Rivers)
Cc:        hackers@freebsd.org
Subject:   Re: "dup alloc" - nope - kern/2875 wasn't it.
Message-ID:  <199703061807.LAA13715@phaeton.artisoft.com>
In-Reply-To: <199703061133.GAA06021@lakes.water.net> from "Thomas David Rivers" at Mar 6, 97 06:33:04 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > I guess it would be worth while to take out the printf's until you can
> > isolate the printf's that "fix" the problem.  Then analyze the effects of
> > the printfs serializing writes.
> 
>  My thinking exactly - I've now gone back to just a pristine kernel and
> I'm trying to find a missing splbio()/splx(), or something along those
> lines... so far, no luck...


I am, of course, unable to duplicate your panics.

I suggest you buckle down and do it the hard way; I'd help if I could
duplicate the problem, or if my changes would not be seen as gratuitous,
but I can't.  Without a problem fix resulting, there's no way I can
prove that eliminating all possible race conditions is a Good Thing(tm)
to those people who aren't getting bitten.

Here is what I suggest; effectively, you will be required to perform
a full branch-path analysis of much of the code, by hand.  If you
have a copy of BattleMap, you could use it some places, but since
most kernel routines are not single-entry/single-exit, I would not
recommend spending the $4000 or so for the software just for this
problem, since it won't help much.


Get a full call path for a single operation mapped out, using whatever
epicycles are necessary in the graph to represent concurrency of the
operations.  You must produce a branch map for each routine involved.
A concurency occurs wherever:

o	Interrupts are enabled
o	A page fault may occur during processing
o	An operation is queued
o	An operation is dequeued
o	A queue element is allocated
o	A queue element is freed
o	A queue element is potentially reused
o	A sleep occurs
o	A wakeup occurs
o	An operation is queued toa bus master device
o	A bus master device completes an operation
o	A bus master device *cancels* an operation
o	A bus master device *restarts* an operation


Then redzone your maps for all possible "context switches" (quoted to
account for fault based or interrupt based processing path reentrancy).

Then bluezone any shared datum in the code path for every possible
cycle.

Whatever is simultaneously in a redzone and a bluezone is a possible
problem.  One of them is *the* problem.

Adjust the redzones to add reeentrancy protection (probably via spl)
so that they do not overlap the bluezones.

The problem should go away.

This would be a lot easier if the code were datum-prime instead of
procedure-prime, but no one respects dataflow any more but us old
theorists.  8-(.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703061807.LAA13715>