Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Nov 2008 23:30:42 +0100
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Mark Linimon <linimon@lonesome.com>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: Free Ultra2 in Silicon Valley, USA
Message-ID:  <20081103223042.GB8256@alchemy.franken.de>
In-Reply-To: <20081031131827.GA9613@soaustin.net>
References:  <20081031124442.GB9102@soaustin.net> <183638.12752.qm@web56802.mail.re3.yahoo.com> <20081031131827.GA9613@soaustin.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 31, 2008 at 08:18:27AM -0500, Mark Linimon wrote:
> On Fri, Oct 31, 2008 at 06:02:28AM -0700, mdh wrote:
> > A dual CPU Ultra2 is going to be a lot more powerful than an Ultra5.
> 
> Hmm, ok, didn't know that.  If no-one else claims it first, perhaps
> I can claim it for the build cluster and pull one of the Ultra 5s.
> I intend to be out in .ca.us for meetBSD.
> 
> > E4500's can be relatively beefy.
> 
> We could have gotten our hands on some more of them in .ca.us but the
> problem is who wants to pay for the power :-(  Really, their time has
> come and gone.
> 
> > OK, this is probably way over my head, but I'll bite - what exactly
> > happens if you don't breakpoint through it?  
> 
> http://people.freebsd.org/~linimon/studies/dmesgs/dmesg.netra_1_t200.txt .
> 
> This appears to be some kind of race condition; my guess from fooling
> around with it is that some interrupt is enabled, and then fires, before
> the setup to handle it is finished.  (Note that the same kernel runs
> fine on the 100s).  By stepping through it, you can see it fail at
> different locations; without stepping through it, it is always at
> the same.
> 
> Unfortunately my notes are at home and that machine is unreachable ATM.
> 

It's more likely that a device is exceeding the mapping
provided, which causes the uncorrectable DMA error
interrupt and in turn happens in different locations
depending on how far the CPU has progressed since the
transfer request was issued to the device. Anyway,
the panic message provided isn't enough info to even
guess what the real cause is. I think the easiest way
to proceed would be to remove the remaining NIC (is
there a reason you disabled gem(4) for the on-board
ones?) and mass storage controller drivers one by one
and see when the panic goes away. I'd begin with just
disabling ATAPI DMA (meanwhile done by the sparc64
loader by default) though as ata(4) has a known issue
causing data corruption with the ALI M5229 and ATAPI
DMA on sparc64, which isn't impossible to be related
with your problem. That said, my T1 AC200 is running
fine and I've never seen such a problem with it...

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081103223042.GB8256>