Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 09 Sep 2002 20:40:02 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Greg 'groggy' Lehey <grog@FreeBSD.org>
Cc:        Stacy Millions <stacy@millions.ca>, hackers@FreeBSD.org
Subject:   Re: I climb the mountain seeking wisdom
Message-ID:  <3D7D6992.C98B8009@mindspring.com>
References:  <XFMail.20020906135858.jhb@FreeBSD.org> <3D78F291.8010005@millions.ca> <20020908064449.GG46846@wantadilla.lemis.com> <3D7D212E.6030601@millions.ca> <20020910021732.GB20691@wantadilla.lemis.com> <3D7D639A.99E034EC@mindspring.com> <20020910032014.GG20691@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Greg 'groggy' Lehey wrote:
> On Monday,  9 September 2002 at 20:14:34 -0700, Terry Lambert wrote:
> > Greg 'groggy' Lehey wrote:
> >> There will always be situations where the debugger can't catch the
> >> problem in time.  Then it's up to you to guess and put a breakpoint
> >> just before it freezes; this can be an interative process.  The method
> >> requiring the least thought is to single step over function calls
> >> until the system freezes.  Then you know which function it happened
> >> in.  Reboot, set a breakpoint in that function, and repeat.
> >
> > Dumping a bunch of printf's in, with "Here 1\n", "Here 2\n", and so
> > on will find this problem a lot faster than an equivalent number of
> > reboots.  8-).
> 
> That depends on how well you use each tool.

My timing for a reboot is about 30 seconds, a build and reboot
is about 4 minutes.

The problem with a debugger that needs a reboot before you can
try again, because it only lets you track the problem after it
occurs, is that, even with a binary search, you are talking a
total of 30 * log2(n) seconds for a granularity of 1/n.  The
other problem is that you're already in trouble, because you
need to divide that granularity up linearly, and the code is
not going to execute that way.  So you end up needing to know
a lot of function names/addresses to set your breakpoints,
and then you have to watch them.

The real pain is that the debugger doesn't print a visual
history of the continuation events, so you can't just break on
everything, and then put a screwdriver in your keyboard to
repeat continuations until failure, to get a function address
list.

Actually, the absolute easiest way is to use the compiler
profiling options, and printf on function entry/exit the
address.  The problem with doing this globally with all the
kernel code is that you can't use printf until after the
console is started, so you have to pick what to compile with
the option, and what to omit it from.  The code is really
poorly organized for this approach, so I didn't mention it
last time.  But if you do this, you will, in effect, get a
"truss" of the function execution within the kernel.

The VAX/VMS and Windows debuggers have this same problem,
where if you crash, you have to figure out where the crash
minus one instruction is, in order to have sufficient data
to match the crash.  Even when you narrow it down enough
that you can step forward to the crash, since it's a hard
crash, you can't examine the values after, only before, so
it's still a pain in the neck that takes one or two more
reboots (or more) track down.

Dropping a bunch of printf's, even if you have to rebuild a
couple of times to narrow it sufficiently, ends up being a
lot easier.  8-).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D7D6992.C98B8009>