Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 9 Apr 2003 20:07:53 +0200
From:      Wilko Bulte <wkb@freebie.xs4all.nl>
To:        Jens =?iso-8859-1?Q?R=F6der?= <j.roeder@tu-bs.de>
Cc:        freebsd-alpha@freebsd.org
Subject:   Re: alpha/50659: reboot causes SRM console to loop endless error and needs to be restetted hard
Message-ID:  <20030409180753.GB14966@freebie.xs4all.nl>
In-Reply-To: <Pine.HPX.4.33.0304090927390.16819-100000@rzsrv1.rz.tu-bs.de>
References:  <20030408181856.GA10163@freebie.xs4all.nl> <Pine.HPX.4.33.0304090927390.16819-100000@rzsrv1.rz.tu-bs.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 09, 2003 at 10:56:35AM +0200, Jens Röder wrote:
> 
> 
> Hello Wilko,
> 
> thanks a lot for the kind reply. I will go into more details below:
> 
> 
> On Tue, 8 Apr 2003, Wilko Bulte wrote:
> 
> 
> > >  It run perfectly under FreeBSD 4.7 but unfortunately the kernel was not
> > >  stable with having probably problems in memory so that I tried the 5.0.
> >
> > Do you mean it reports Processor Correctable memory errors? How much memory
> > does it have?
> 
> The machine has about 1 GB RAM. Honestly I am not sure what "processor

1GB... that is overkill for a gateway, but hey, it should not hurt ;)

> correctable memory errors" are, maybe it helps to show the output. That
> was from a selfcompiled kernel under 4.7 but I had the same problems when
> trying a generic.

That is a kernel panic, not a memory problem ;) 

Most Alphas, and your AS500 too, have ECC (error correction) memory. That allows
single bit memory errors to be corrected. The kernel will tell you if a
correction was applied, these are the processor correctable errors I
mentioned. 

> 
> Mar 20 10:54:55 ptchgate /kernel:
> Mar 20 10:54:55 ptchgate /kernel: fatal kernel trap:
> Mar 20 10:54:55 ptchgate /kernel:
> Mar 20 10:54:55 ptchgate /kernel: trap entry = 0x4 (unaligned access fault)
> Mar 20 10:54:55 ptchgate /kernel: a0         = 0xfffffca900010021
> Mar 20 10:54:55 ptchgate /kernel: a1         = 0x2c
> Mar 20 10:54:55 ptchgate /kernel: a2         = 0x11
> Mar 20 10:54:55 ptchgate /kernel: pc         = 0xfffffc00004f8564
> Mar 20 10:54:55 ptchgate /kernel: ra         = 0xfffffc00004942b4
> Mar 20 10:54:55 ptchgate /kernel: curproc    = 0
> Mar 20 10:54:55 ptchgate /kernel:

Unaligned accesses in kernel mode are Bad(TM). Check the handbook on
creating more debug info on the crash please.

> At the moment I consider also defect memory and will check that as soon as
> I have a temporarily replacement for that Institute gateway and a night

Very unlikely, this looks like a problem in the kernel to me. 

> Meanwhile I have compiled a kernel with suffiencet debug mode with the
> hope to offer proper error messages.

Can you catch a crash dump maybe? 

> I think the "unalighed access error" when listing the firewall rules
> showed only up in the 5.0 version. I will probably downgrade to 4.7 or 4.8
> (what is better to use?) again and recompile with ipfw2 then, and let you
> know then. Before I will try to produce proper errror messages with the
> debug kernel of 5.0.

I'd go for 4.8. Do you need any ipfw2 functionality? 

> Maybe you can try out the SRM console problem without upgrading to 5.0 as
> I remember I first noticed it, when I booted from floppy or CD and called
> the machine to abort. I thought first of the errors reason to be my fault
> because of the abortion. Again 4.7 did not have that problem.

I have a fresh 4.8 on my AS500 and that does not show me the problem.

What kind of PCI cards are in the machine? Can you post a SHOW CONF
from the SRM ?

Wilko

-- 
|   / o / /_  _   		wilko@FreeBSD.org
|/|/ / / /(  (_)  Bulte				



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030409180753.GB14966>