Date: Thu, 7 Feb 2008 17:21:21 +0100 From: Bernd Walter <ticso@cicely12.cicely.de> To: ticso@cicely.de, freebsd-alpha@freebsd.org Subject: Re: DS10L - "processor correctable error" Message-ID: <20080207162120.GG24583@cicely12.cicely.de> In-Reply-To: <20080207154024.GA9605@mech-aslap33.men.bris.ac.uk> References: <20080206121738.GA91825@mech-aslap33.men.bris.ac.uk> <20080207145311.GF24583@cicely12.cicely.de> <20080207154024.GA9605@mech-aslap33.men.bris.ac.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 07, 2008 at 03:40:24PM +0000, Anton Shterenlikht wrote: > On Thu, Feb 07, 2008 at 03:53:12PM +0100, Bernd Walter wrote: > > On Wed, Feb 06, 2008 at 12:17:38PM +0000, Anton Shterenlikht wrote: > > > > > > "Warning: received processor correctable error." > > > > > > What is the meaning of this warning? Something wrong with hardware? > > > > This is an ECC memory correction. > > It is OK to see it once in a while, since even 100% working DRAM has > > failures from time to time (called softerror rate) - therefor the need > > to have ECC in important systems > > If however you see a lot of them it is time to replace the faulty > > memory. > > Bernd, thank you. > Can I know which DIMM (DS10L has 2 DIMMs) is faulty? Unfortunately not. IIRC Tru64 and VMS have support for this, but we never had enough information to handle this and this is board specific as well. > If I run SRM memexer I get: > > >>>show_status > ID Program Device Pass Hard/Soft Bytes Written Bytes Read > -------- ------------ ------------ ------ --------- ------------- ------------- > 00000001 idle system 0 0 0 0 0 > 000003ab memtest memory 6 0 0 5586812928 5586812928 > >>> > Processor correctable error through vector 630. > > Machine Check Logout Frame @ 0x6000 Code = 0x86 > > Alpha 21264 IPRs (CPU 0): > I_STAT: 0000000000000000 DC_STAT: 000000000000000C > C_ADDR: 00000000296287C0 DC1_SYNDROME: 0000000000000000 > DC0_SYNDROME: 000000000000008F C_STAT: 0000000000000003 > C_STS: 000000000000000A MM_STAT: 0000000000000000 > > >>> > > The message appears approx. once every other pass. > The address is always the same. Don't be worried too much about this. Alphas are using the memory in pairs and can correct multiple faulty bits in a single dataword. However - you could try to remove and reconnect the Modules, since it can happen that a contact isn't good after that many years. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080207162120.GG24583>