Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Sep 2000 13:01:55 +0100
From:      Steve Roome <steve@sse0691.bri.hp.com>
To:        Nik Clayton <nik@FreeBSD.ORG>
Cc:        freebsd-doc@FreeBSD.ORG, James Housley <jim@thehousleys.net>, Mark Ovens <marko@FreeBSD.ORG>, Jeff Gibbons <jgibbons@protogate.com>, Brooks Davis <brooks@one-eyed-alien.net>
Subject:   Re: signal 11 faq entry
Message-ID:  <20000920130155.K8111@moose.bri.hp.com>
In-Reply-To: <20000919185552.B12114@canyon.nothing-going-on.org>; from nik@FreeBSD.ORG on Tue, Sep 19, 2000 at 06:55:52PM %2B0100
References:  <20000919165723.D8111@moose.bri.hp.com> <20000919185552.B12114@canyon.nothing-going-on.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi again everyone,

Firstly, I originally only set out here to shed a little bit more
light on _what to do_ in the case of these signal 11 errors. Most
people when it comes to the FREQUENTLY asked question are asking what
they need to do to rectify the problem. I mention this to stem the
tide of all you hardware experts (self included) who would love to
write for "Memory Tester Weekly" or write articles such as "Bizarre
RAM Access patterns cause world's problems"!

I think in this case, the SOLUTION is more important to a FAQ reader
than the explanation. Still, teach a man to fish, and all that...

Memory Testers :
----------------
Brooks Davis mentioned that I should drop my original wording about
memory testers, so I put up the second revision, but there doesn't
seem to be a concensus as only shortly later I received another email
looking at it the other way round. I think that as we don't have
100% agreement on the reliability of memory testers we should leave it
at that, we're not sure, don't trust them (although they may be
accurate!).

I think there's a niche magazine market e.g. "Memory Tester Buyer",
and I really didn't want to get into the whole screaming messy
nightmare of actually saying what hardware meets my criteria for good
hardware, because it'll be different to a lot of other folks opinions.

On Tue, Sep 19, 2000 at 06:55:52PM +0100, Nik Clayton wrote:
>    In particular, a dead giveaway that this is *not* a FreeBSD bug is if
>    you see the problem when you're compiling a program, but the activity
>    that the compiler's carrying out changes each time.

I like that bit.

>    For example, suppose you're running "make buildworld", and the
>    compile fails trying to compile ls.c in to ls.o.  If then run "make
>    buildworld" again, and the compile fails in the same place then this
>    is a broken build -- try updating your sources and try again.  If the
>    compile fails elsewhere then this is almost certainly hardware.

But are we rehashing the sig11 faq here ?

>      1.  Hard disks running too hot.  Check the fans in your case.
> 
>      2.  The processor running too hot.  This might be because you've 
> 	 overclocked the processor (in which case, stop doing that).  Or
> 	 the fan on the processor might have died.


Oh, Overclocking : (James' flameproof suit will come in handy here!)

I had written: 
"An overclocked CPU might also exhibit such symptoms. In which case
you should stop overclocking, as it's far cheaper to have a slow
system than a fried system that needs replacing! Also, the wider
community is not often sympathetic to problems on overclocked systems,
whether you believe it's safe or not."

I don't know the reasoning for changing this, and although I was
probably overly long in my section, I was trying to combat an
arrogance (no better word, please don't shoot me!) we tend to find on
the mailing lists that's unneeded.

Ideally we need to be able to politely say to knowing overclockers
that "Yes, we understand that it CAN and does work for some, but that
if any sort of problem like this occurs please to try things out at
the standard hardware settings before proceeding, perhaps a change
in version is stressing your overclocked hardware differently and now
it's causing problems..." something like that.

It might be worth mentioning that retailers are not always scrupulous,
I fixed a friends computer recently, that was sold as a K6-2 400, but
was an overclocked K6-2 350 (can't remember the figures exactly).

What I don't want happening, is another spate of "but I've only
overclocked my P150 to P166 (that was me a few years ago) and
unnecessary flames about how stupid people are to overclock without
any reasoning behind them. In fact, any excuse to clarify this issue
before it reaches mailing lists is probably a Good Thing.

Anyway, all my P150's overclocked fine to 166, even if I didn't get
round to checking what was really inside. =)

>      3.  Dodgy memory, and/or motherboards.  If you have multiple memory
> 	 SIMMS installed then pull one out and try again.  If everything
> 	 works now then you've got a bad SIMM.  If it fails again, pull
> 	 out another chip, and so on, until you identify the SIMM.

Again is this rehashing the sig11 FAQ. Also, shouldn't this be t'other
way round ? Pull 'em all out and try with one at a time, that might
find the faulty SIMM on the next bootup even!

> 	 Some motherboards are also known to have problems if you fill
> 	 up all the memory banks.

I beleive you on this one Nik, but I think that the probability of
someone reading this having been hit by that last problem are slim,
whereas the chance of them beleiving it in preference to a more likely
(and perhaps the actual) cause is high. I'll leave my comment there
though, as I'm not sure either.

On Tue, Sep 19, 2000 at 02:03:59PM -0400, James Housley wrote:
> Very true.  Most MB's cant handle the 36chip DIMMs.

Aha, should we point people in the direction of their motherboard
manuals, mine specifically mentions that more than 16 chips on a
DIMM are not supported by this m/b and may be unreliable.

On Tue, Sep 19, 2000 at 02:51:49PM -0700, Jeff Gibbons wrote:
>     4.  Unclean or insufficient power to the motherboard.  If
> 	you have any unused I/O boards, hard disks, or CDROMs in
> 	your system, try temporarily removing them or disconnecting
> 	the power cable from them, to see if your power supply can
> 	manage a smaller load.  Or try another power supply,
> 	preferably one with a little more power (for instance, if
> 	your current power supply is rated at 250 Watts try one
> 	rated at 300 Watts).

I watched a computer (windows) reboot from a power glitch yesterday,
strange that this should come along now, but are we getting into
"my hardware is going weird on me". A much longer discussion!


Okay, that was overly long, but I think I've commented on everything
people sent me since yesterday, and hopefully I've helped.

Thanks again for everyone's input, good to see I've stirred up some
interest in something =)

	Steve


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000920130155.K8111>