Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 May 2005 18:32:09 -0700
From:      Jon Kuster <kwsn@earthlink.net>
To:        Palle Girgensohn <girgen@FreeBSD.org>
Cc:        toby.murray@gmail.com, freebsd-amd64@freebsd.org
Subject:   Re: Panic while running jdk15
Message-ID:  <1116984729.4388.14.camel@jonnyv.kwsn.lan>
In-Reply-To: <24CD85AD72E7F49E3A9AC091@rambutan.pingpong.net>
References:  <1115839640.59966.12.camel@jonnyv.kwsn.lan> <1115965490.59966.18.camel@jonnyv.kwsn.lan> <24CD85AD72E7F49E3A9AC091@rambutan.pingpong.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2005-05-25 at 02:25 +0200, Palle Girgensohn wrote:
> --On torsdag, maj 12, 2005 23.24.50 -0700 Jon Kuster <kwsn@earthlink.net> 
> wrote:
> 
> > On Wed, 2005-05-11 at 12:27 -0700, Jon Kuster wrote:
> >> After we managed to get jdk15 built and then shipped our box to the
> >> colo, it has started panicing.  We haven't been able to reliably
> >> reproduce this yet, but it always happens when our java program is doing
> >> it's thing.
> >>
> >> kernel trap 12 with interrupts disabled
> >>
> >> Fatal trap 12: page fault while in kernel mode
> >> cpuid = 0; apic id=00
> >> fault virtual address = 0x1c0
> >> fault code = supervisor write, page not present
> >> instruction pointer = 0x8 :0xffffffff80382348
> >> stack pointer = 0x10 :0xffffffff7935aa0
> >> frame pointer = 0x10 :0xffffffff7935ae0
> >> code segment = base 0x0, limit 0xfffff, type 0x1b
> >>              = DPL 0, pres 1, long 1, def32 0, gran 1
> >> processor eflags = resume, IOPL = 0
> >> current process = 6503 (sh)
> >>
> >> I haven't been able to get a dump yet, or even a trace in ddb - our
> >> remote management card apparently emulates a usb keyboard which doesn't
> >> seem to work when the box is paniced.
> >>
> >> nm -n /boot/kernel/kernel |grep ffffffff803823
> >> ffffffff80382330 T cpu_throw
> >> ffffffff80382380 T cpu_switch
> >
> > We've switched off Hyperthreading (we're running em64T xeons), and that
> > seems to have worked around the problem.  It's a little too early to say
> > for sure, but we were seeing panics twice a day, and we haven't had a
> > panic in about a day and a half.
> 
> Hi!
> 
> This looks very similar to our problem. Dell 2850 (i.e. em64T xeon, two 
> CPUs). Turning off HTT made it live longer (long enough for med to believe 
> it actually solved the problem), but after a week or so it crashed twice a 
> day again. We're *not* running java, though. Apache 1.3, php4, 
> postgres8.0.3, amavis (i.e. perl), postfix. apache, postgres and php are 
> very loaded, the machine has a load >= .8 most of the time (mostly due to 
> sloppy code, but anyway).
> 
> 5.4-release made it better, for a few days, but then it started crashing 
> again. Today, I've built a non-SMP kernel, so we're effectively running a 
> single CPU. It has not crashed so far (but it is slow).
> 
> Always Fatal trap 12: page fault while in kernel mode
> 
> It also hangs and does not reboot by itself. it seems so hard it never 
> manages to save a core dump, and has to be restarted by hitting the big 
> button.
> 
> Contacted Dell support, as I'm beginning to suspect the hardware. After 
> BIOS upgrade today, recommended by Dell, The machine hung at userland 
> startup, when starting the various daemons. Five times in a row, at least. 
> Then it decided to actually come up, and stayed up for eight hours. then 
> down again. sic...
> 
> If it works fine with one CPU, is it likely to be hardware problem or 
> software?
> 
> Jon, you report is a few weeks old, what happened? Does it live happily w/o 
> HTT?

We're running a Dell PE1850, which I think is exactly the same as a
2850, except in a smaller case. It's been up ever since we turned off
HTT.  It hasn't been in full production due to some network issues so
it's idle most of the time (and probably will be even when it is in
production), but it's survived running our java program repeatedly while
a -j8 buildworld was going on.  It also survived an incident where we
ran it out of swap.  I don't want to say turning off HTT solved the
problem, but it has eliminated the panics for us.

-Jon




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1116984729.4388.14.camel>