Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Oct 2003 23:45:21 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Harti Brandt <brandt@fokus.fraunhofer.de>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Random signals in {build,install}world recently?
Message-ID:  <3F94D601.D1B09031@mindspring.com>
References:  <20031020081944.GA40541@kevad.internal> <20031020152755.I47918@beagle.fokus.fraunhofer.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Harti Brandt wrote:
> On Mon, 20 Oct 2003, Mark Santcroos wrote:
> MS>On Mon, Oct 20, 2003 at 10:27:38AM +0200, Harti Brandt wrote:
> MS>> On Mon, 20 Oct 2003, Vallo Kallaste wrote:
> MS>> VK>Basically one will get random signals as I have got in build- and
> MS>> VK>installworld. It's impossible to complete make -j2 buildworld on my
> MS>> VK>machine, but sometimes non-parallel buildworld will do, only to die
> MS>> VK>later in installworld.
> MS>> VK>This is on two-processor AMD 2400+ MP system, ASUS A7M-266D mobo and
> MS>> VK>1GB ECC memory, ATA disks and CD/RW-DVD only. 4BSD scheduler if it
> MS>> VK>matters.
> MS>>
> MS>> I have the same MB just with 1800+ processors. I had to reduce the CPU
> MS>> frequency by about 10% in the BIOS setup to get the machine stable. I
> MS>> assume the problem is actually the memory.
> MS>
> MS>Couldn't the following be of help here?
> MS>
> MS>options         DISABLE_PSE
> MS>options         DISABLE_PG_G
> 
> Is the processor bug that these options seem to circumvent dependend on
> the actual operating frequency of the processor?

No.  It is dependent on the amount of memory in the system, and the
specific processor features.  For example, if you have a newer chip
pair, you are more likely to see the problem than on an older system,
though all Pentium class processors supporting 4M pages have the
problems.

If the issues you are seeing are not signal 10's in processes or a
trap 12 (page not present) panic of the kernel, then most likely the
issue with the 1800+ machine is thermal, if it is not in fact a bad
memory issue (have your memory tested on a professional test machine).

I've noticed a lot of bad problems with Hynix memory lately; your
mileage may vary.  At Whistle we had a problem with memory with Gold
contacts, and didn't have any problems with the ones with Tin.

If you could enable HLT in the idle loop (there is a sysctl, and,
I don't know if it's been integrated yet, Julian Elischer published
a patch that did this and aded an IPI to work around the scheduling
latency, which is what not having the HLT supposedly fixed), then
you will likely see the intermittent problem clear itself up, if it
is in fact thermal.

If you are overclocking your machine, or you have bought parts that
have been falsely labeled as higher frequency than they are actually
rated to run at ("counterfeit" chips), then either of these issues
could also be your problem.  So could a borderline power supply.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3F94D601.D1B09031>