From owner-cvs-all@FreeBSD.ORG  Tue Jan 20 10:29:29 2004
Return-Path: <owner-cvs-all@FreeBSD.ORG>
Delivered-To: cvs-all@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 618)
	id 4573C16A4CF; Tue, 20 Jan 2004 10:29:29 -0800 (PST)
In-Reply-To: <20040120175334.W3279@gamplex.bde.org> from Bruce Evans at "Jan
	20, 2004 06:11:25 pm"
To: bde@zeta.org.au (Bruce Evans)
Date: Tue, 20 Jan 2004 10:29:29 -0800 (PST)
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Message-Id: <20040120182929.4573C16A4CF@hub.freebsd.org>
From: wpaul@FreeBSD.ORG (Bill Paul)
cc: cvs-src@FreeBSD.org
cc: src-committers@FreeBSD.org
cc: phk@FreeBSD.org
cc: cvs-all@FreeBSD.org
Subject: Re: cvs commit: src/sys/alpha/alpha support.s src/sys/i386/i386
	swtch.s src/sys/kern kern_shutdown.c src/sys/sys systm.h
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jan 2004 18:29:29 -0000


[abuse of __FILE__ and __LINE__ in panic()]
 
> This was rejected in all reviews.  It gives less information than
> grepping the sources, at some cost (grep at least gives correct line
> numbers when the sources don't quite match the binary).
> 
> Please back this out.

I have to second this.
 
> >   Ideally a traceback should be printed too, any takers ?
> 
> This can be obtained by running a debugger on the panic dump.  Line
> numbers and source file names can also be printed by the debugger
> if the executable has at least line numbers in its debugging info.

That's fine for developers who always run their systems with crash
dumps enabled and posess sufficient debugger fu to analyze them. What
about ordinary users who encounter a kernel panic due to a trap
and need to report it? It would save wear and tear on everyone's
nerves (_ESPECIALLY_ mine) if they could just send in a traceback
rather than developers being forced to do the usual "go back and
do nm ${KERNEL} and tell us what you see" exchange.

[...]

> Verbose panic messages, and lots of code to print out values of variables
> just before panicing, are another mistake.  Short panic messages were
> good enough when debuggers were primitive or nonexistent.  Verbose panic
> messages are even less needed now.

Ok, hang on just a minute.

Let's be clear here. Adding linenum/sourcefile info to all panic()
calls is kind of pointless, I agree. (If you're too lazy to grep
the source for the panic message, you're just no damn good.) However,
there are times when printing the contents of some variables in
the panic string is _INCREDIBLY_ _USEFUL_. Terse panic messages
are annoying -- for that matter so are terse error messages in general.
Error messages are meant to a) inform the user that they need to
perform some action to correct a problem and b) alert a software
developer to a potential bug and, hopefully, how to fix it. It is
not enough to note that an error occured: you have to explain
_WHY_ it occured.

Really bad panic message:

	panic("mbuf is corrupt (but I'm not telling you exactly "
	    "what's corrupt about it because god forbid I should "
	    "make life easier for you)");

Good panic message:

	panic("mbuf is corrupt: mbuf %p has m_len of %d and m_flags of 0x%x "
	    "but m_data is NULL", m, m->m_len, m->m_flags);

By showing the contents of the suspicious variable/structure, you give
a developer some clue as to what was going on in the system when the 
variable/structure was trashed. Maybe m_len is the size of a protocol
header that points to a particular protocol module being suspect.
Maybe m_flags has M_DONGS set which points to code that Alfred wrote.
It may not make sense to dump out every single piece of available state,
but not providing any state at all is just damn rude.

Could you determine this from the crash dump? Sure -- but do you have
any idea how much of a pain in the ass that can be, especially when
the crash occurs on a system that isn't yours? By default, crash
dumps are as large as physical memory, and most systems now have at
least 128MB of RAM. The average user does not know how to analyze
a crash dump, which means even if they get one, they won't be able to
do anything with it, except send it to me. That means I'll have people
throwing 128MB files at me, which I can't even analyze unless I happen
to have a system handy running _exactly_ the same kernel as they are.
This is an amazing amount of trouble to go through to determine a
couple of pieces of info that could just as easily have been printed
on the console, where the user could copy them down and include them
in an e-mail.

Note that I am not saying we should immediately go back throug the
kernel source and verbosify every single panic() call we come across.
I _am_ saying that developers should not use existing panic() messages
as models for their own. I am also not going to let you lead them
down that path, for that way lies Linux^Wmadness.

Anyway, to reiterate: I don't think the lineno/sourcefile additions
to panic() are worth the bloat. Tracebacks on the other hand, would
be worth their weight in gold bikesheds.

-Bill

--
=============================================================================
-Bill Paul            (510) 749-2329 | Senior Engineer, Master of Unix-Fu
                 wpaul@windriver.com | Wind River Systems
=============================================================================
              <adamw> you're just BEGGING to face the moose
=============================================================================