Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Aug 2008 11:44:54 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Kirk Strauser <kirk@strauser.com>
Subject:   Re: System, diagnose thyself: auto-documentation for crashes
Message-ID:  <200808291144.54193.jhb@freebsd.org>
In-Reply-To: <BDDFB834-C15F-4E48-B1D1-B644940FBE42@strauser.com>
References:  <BDDFB834-C15F-4E48-B1D1-B644940FBE42@strauser.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 29 August 2008 11:13:57 am Kirk Strauser wrote:
> I was having flaky system problems that were driving me to  
> distraction.  Yesterday, I finally got a panic message with an  
> instruction pointer, used addr2line to see that the failure was in  
> uma_zfree_internal, searched Google, and learned that it was probably  
> due to bad RAM.  Half any hour later, memtest86 found the defective  
> stick and the problem was solved.
> 
> This led me to thinking, though: the OS already had all the  
> information needed to figure out where the problem was.  If there had  
> been an explanation inside that function definition, FreeBSD could  
> have automatically gone to the file, searched for that explanation,  
> and told me why my system had probably crashed.
> 
> I propose that we:
> 
> 1) Settle on a standard comment format for metainformation.  There are  
> already standards like Doxygen if we didn't want to home-roll something.
> 
> 2) Write a program that takes an instruction pointer and outputs the  
> comment for the associated function.
> 
> 3) Modify /etc/rc.d/savecore to run the program from #2.
> 
> For instance, suppose the comments in sys/vm/uma_core.c looked like:
> 
> /*
>   * Frees an item to an INTERNAL zone or allocates a free bucket
>   *
>   * Arguments:
>   *      zone   The zone to free to
>   *      item   The item we're freeing
>   *      udata  User supplied data for the dtor
>   *      skip   Skip dtors and finis
>   *
>   * Failure:
>   *      Failures in this function are commonly due to defective RAM.
>   */
> static void
> uma_zfree_internal(uma_zone_t zone, void *item, void *udata,
>      enum zfreeskip skip, int flags)
> {
> ...
> }
> 
> If I'd seen that failure message in my syslog, I would have avoided a  
> few days of teeth gnashing.  What do you think?  I think something  
> like this could be extremely useful.  Benefits:
> 
>   - There would be zero impact on performance because it would only  
> touch comments and not any running code whatsoever.
>   - It would require minimal work.
>   - It could be done incrementally.  Document known common failure  
> points and add others with time.
>   - It wouldn't affect any other systems.

See /usr/sbin/crashinfo for a start.  I have patches to enable it 
from /etc/rc.d/savecore after generating a patch (still need to test them 
though).

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200808291144.54193.jhb>