Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jan 2001 21:09:33 -0600
From:      David Kelly <dkelly@hiwaay.net>
To:        Brett Glass <brett@lariat.org>
Cc:        "FreeBSD Chat List" <freebsd-chat@FreeBSD.ORG>
Subject:   Re: ECC worth the extra cost for SOHO server? 
Message-ID:  <200101110309.f0B39XR67379@grumpy.dyndns.org>
In-Reply-To: Message from Brett Glass <brett@lariat.org>  of "Wed, 10 Jan 2001 14:04:24 MST." <4.3.2.7.2.20010110140119.04980410@localhost> 

next in thread | previous in thread | raw e-mail | index | archive | help
Brett Glass writes:
> At 10:03 PM 1/9/2001, Francisco Reyes wrote:
> 
> >It probably doesn't support ECC.
> 
> I've got an e-mail in to Via Technologies, but I am not hopeful
> that the chipset supports ECC. It's a shame.... When you're doing
> up to 133 million accesses a second, it's nice to have some
> safeguards. But AMD and Via need to get into the mainstream 
> consumer market, where price is an issue and ECC will rarely be 
> used. I think that you can use the ECC-enabled North Bridge chip
> from the KX-133 chipset if you really need ECC.

An interesting thing came with my new G4 PowerMac: a diagnostic CDROM.
Haven't done anything with it yet. Apple designs fantastic hardware. But
also too skimps on parity/ECC. The thing is that somebody needs to click
with the morons making these incorrect decisions and explain something.
I don't want ECC because I see it in "servers" and therefore want the
same thing. Over the years I have had several instances of bad memory in
my personal machines, many more problems at work. Its a bitch to
diagnose. I've seen the "pros" screw up and get it wrong. So when
shopping for PC parts I see very little additional cost for ECC on the
MB, most of the cost is in memory. I want ECC so the hardware can tell
me its sick. The fact that it may be self-healing is secondary.

In 1991-1992 in my second temporary career as a sysadmin, had an
installation of (12) SGI 4D-3xx machines with a total of 38 CPU's. (9)
machines had a huge memory board with (64) 1M SIMMs, (3) had two of
these for a total of 128MB. This was the first time I started paying
attention to ECC and the corrections. One machine had a correction every
24 hours or so. Another never. The rest were somewhere in between. SGI
only latched a single ECC event so I polled it every 15 minutes from
cron, and cleared, so I could accurately log the next one. SGI's
position was "if it doesn't crash its not broken." Never could associate
a crash with the memory.

In my third temporary career as a sysadmin, 1996-1998, the machines were
physically much smaller, much faster, about the same memory. About half
Sun, half SGI. From 20 to 40 systems. And had a lot more problems with
memory. SGI and Sun were both logging ECC events to the normal syslog
chain complete with diagnostics pointing out the exact address and
memory slot the problem originated. When the same event repeated within
a week, had no problems getting Sun or SGI to replace the component.
Unlike the past, these systems sometimes crashed on memory failure as
more than one bit would fail.

Shutting down an intermittent system to run system diagnostics never
detected these problems. Such is the nature of memory problems. I want
ECC on my machines if only for the continuous diagnostics it provides. 
I don't consider it much different than AsusProbe, for monitoring CPU 
temperature and fan speeds.

Demanded ECC memory in my rebuilt computer. So the local PC shop put it 
in. Into an Asus A7V with KT133. Then I find it doesn't do ECC.  :-(
Took 3 evenings of juggling the BIOS config before the machine would 
survive "make buildworld" and moderate ethernet activity. All that time 
I was wondering if I had a memory problem. Think the problem was "PCI 
Master Read Caching" and/or "Delayed Transaction". The shop enabled 
both. BIOS defaults to disabled. System is reliable with defaults.

--
David Kelly N4HHE, dkelly@hiwaay.net
=====================================================================
The human mind ordinarily operates at only ten percent of its
capacity -- the rest is overhead for the operating system.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101110309.f0B39XR67379>