Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Sep 1998 00:54:28 -0700
From:      Mike Smith <mike@smith.net.au>
To:        Brett Glass <brett@lariat.org>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: Remember those spontaneous crashes I was getting? 
Message-ID:  <199809210754.AAA21394@word.smith.net.au>
In-Reply-To: Your message of "Mon, 21 Sep 1998 00:48:03 MDT." <199809210650.AAA00276@lariat.lariat.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Well, we still get one every day or two, at odd times. But I can ALWAYS
> make them happen by piping dump through gzip to ftp to a disk on a remote
> machine -- our usual backup procedure.
> 
> Anyway, when I first reported this crash, I was asked what message
> appeared. Unfortunately, it flew by so fast that I couldn't tell what it
> said! So, tonight, seeing that it was a slow night and no users were on, I
> swapped the kernel for one with the debugger enabled and started the backup
> procedure.
> 
> Sure enough, a crash. The screen said:
> 
> Fatal trap 9: general protection fault while in kernel mode
> 
> Instruction pointer = 0x8:0xf0176fb5
> Stack pointer = 0x10:0xf0199000

Are you 100% sure about these numbers?  The kernel stack pointer 
shouldn't be higher than the instruction pointer.  This looks like 
either corrupt code eating %esp or a CPU fault.

> Frame pointer = 0x10:0x0
> Code segment = base 0x0, limit 0xfffff, type 0x1b
>              = DPL 0, pres 1, def32 1, gran 1
> 
> Processor eflags = interrupt enabled, resume, IOPL = 0
> 
> Current process = Idle
> 
> Interrupt mask = 
> 
> kernel: type 9 trap, code = 0
> 
> Stopped at idle_loop_0x3d: jmp idle_loop

There's nothing illegal about this at all; this really looks like a 
memory read error (bad memory, CPU, cache or motherboard).  You might 
have received the GPF because the stack pointer is pointing into the 
kernel text segment (which it probably can't write to).

Corrupting the stack pointer (as opposed to corrupting the contents of 
the stack) is pretty difficult.  It's also very difficult to track 
down. 8(

> As I began to play with the debugger (I really didn't know the commands), I
> saw:
> wd0: interrupt timeout
> wd0: status 50<rdy,seekdone> error 0
> 
> ...which may not have meant anything, but then again....

It just means that you were in the middle of a disk operation, which 
subsequently timed out (because the debugger was running).

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809210754.AAA21394>