Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 May 2009 13:22:59 -0500 (CDT)
From:      Larry Rosenman <ler@lerctr.org>
To:        Kip Macy <kmacy@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: ZFS Crash
Message-ID:  <alpine.BSF.2.00.0905291319510.79532@thebighonker.lerctr.org>
In-Reply-To: <alpine.BSF.2.00.0905291251270.78337@thebighonker.lerctr.org>
References:  <alpine.BSF.2.00.0905250040230.1781@borg> <3c1674c90905242253n544c3f0cqb10952f349391ce7@mail.gmail.com> <454b8cc37c60ab7af2663ba70ddbfd59.squirrel@webmail.lerctr.org> <5a9a181a12e9e4ef864d23ae063f7277.squirrel@webmail.lerctr.org>  <alpine.BSF.2.00.0905250803350.79867@borg> <alpine.BSF.2.00.0905260702300.1820@borg> <3c1674c90905280055h740bce23p33b18fefacf31196@mail.gmail.com> <alpine.BSF.2.00.0905280724480.58845@borg> <alpine.BSF.2.00.0905291242060.77764@thebighonker.lerctr.org> <alpine.BSF.2.00.0905291251270.78337@thebighonker.lerctr.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 29 May 2009, Larry Rosenman wrote:

> On Fri, 29 May 2009, Larry Rosenman wrote:
>> 
>> Ok, it just crashed.  Unfortunately, I'm at work and the box is at home.
>> 
>> I did have my script running every minute of that entire boot.
>> 
>> What I saw was a full backup running, and then we started paging, and then
>> the backup jobs got pager errors, and were killed.
>> 
>> I'm not sure what else went on, so I restarted the bacula daemons that
>> got killed, and was in the bacula console when it died.
>> 
>> I'll see if I can get a cell-phone camera shot of the console.
>> 
>> I'll also tar up the vmstat outputs and put them on my web server.
>> 
>> What other forensics should I get?  Bear in mind the system is probably
>> locked up with no dump taken :(
> One other "interesting" thing is the IPMI card seems to also be locked up. 
> I.E. if I try to login to it, it just hangs after giving id/pw.
>
Ok, I let the IPMI sit, and it eventually showed me the console.

I took a screenshot, and then reset the box.  I did get a textdump
but it didn't run my scripts for the ddb stuff.

Here is links to what I do have:
http://www.lerctr.org/~ler/ZFS_CRASH/

$ ls -l
total 5017
-rw-r--r--  1 ler  ler  5051465 May 29 13:14 crash.stats.tar.gz
-rw-r--r--  1 ler  ler      253 May 29 13:18 index.html
-rw-r--r--  1 ler  ler    77004 May 29 13:16 ipmiconsole.png
-rw-r--r--  1 ler  ler    70656 May 29 13:14 textdump.tar.6
$

What else can I supply?

The crash.stats.tar.gz contains the minute by minute output of the followuing
script for the entire boot:
-----
#!/bin/sh
DATE=`date +%Y%m%d.%H%M%S`
(echo "Uptime:";uptime;echo "vmstat -m:";vmstat -m
  echo "vmstat -z:";vmstat -z) >/home/ler/stats/${DATE}.stats
-----


-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 512-248-2683                 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0905291319510.79532>