Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Apr 1996 12:29:49 +0930 (CST)
From:      Michael Smith <msmith@atrad.adelaide.edu.au>
To:        scrappy@ki.net (Marc G. Fournier)
Cc:        msmith@atrad.adelaide.edu.au, hasty@rah.star-gate.com, current@FreeBSD.org, hackers@FreeBSD.org
Subject:   Re: Intelligent Debugging Tools...
Message-ID:  <199604240259.MAA14238@genesis.atrad.adelaide.edu.au>
In-Reply-To: <Pine.NEB.3.93.960423221546.3520F-100000@freebsd.ki.net> from "Marc G. Fournier" at Apr 23, 96 10:28:37 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Marc G. Fournier stands accused of saying:
> > > sctarg0(noadapter::): Processor Target 
> > 
> > Wozzis?  You have a scanner or something?
> >
> 	*shrug* nothing else other then those three cards in the
> machine...

Hmm.  Have you got a 'pt0' in your kernel config?

> > Jumpered for write-back or write-through?  Which cache Tagram are you
> > using?  Is the board jumpered correctly for it?
> >
> 	will check the wb/wt...what do you mean by cache Tagram?

Cache SRAM comes in two parts, the cache itself and the tag ram that
keeps track of what's in the cache.  The SiS boards we use (Soyo) have
jumpers for dealing with two different tag parts.  I don't have, and 
have never used any Acer boards, so I don't know about them.

> > The problem, such as it is, is that these pieces of code depend intimately
> > on some _very_ heavily accessed parts of memory, and if there's anything
> > wrong with these parts of memory, they cause a panic.
>
> 	Okay, that part I don't have a problem with, but, if it
> detects that something is wrong, why doesn't it try (or does it?) to
> work around the problem?  Map out the offending area of memory?  As

How?  Has the code followed a bad pointer to the area it's working on?
Is the location it just read from bad, or is it comparing it with a 
bad value it picked up a while ago?  Is it the RAM that's bad, or the 
cache?  

You can't just "map out" the bits of memory that the code is working
with - all the critical data structures that the system depends on 
are in there - to "map them out" would mean total chaos.

Simple by far is to say "this value is impossible" and tell the owner
of the losing hardware that it lost.

> well, does this restrict the problem to cache/RAM vs swap space?
> I'm don't yet understand how swap space works (I know what it does,
> just not how), but I presume that the swap space is a psuedo-file
> system?  Since I'm on SCSI, I would assume that it would auto-remap
> defective areas in the swap space, if my first assumption is
> correct?

You should spend some time at night with the new daemon book, which
should be out shortly - "The design and implementation of the 4.4 BSD 
Unix operating system" and perhaps also "The Magic Garden explained".

Once you've digested those two, you'll have an appreciation (if not
an understanding) of the issues involved.  If they give you a headache,
try "Operating systems, design and implementation" by Tanenbaum (the
Minix book) first, and then come back.

> 	I can understand having bad luck with one machine, but I have
> three machines here, all with different hardware and they all have 
> "hardware related" panics, two are -stable, one is -current (so I kind
> of expect problems with the third)

We have anything from a few to about a dozen machines around here at any
one time, a mix of 2.1R, -stable and -current.  A few have known hardware
bogons (old systems), but aside from that (and the machines running 
some of my less-than-wonderful device drivers), we don't see many 
problems.  (Usually someone else runs across the really nasty ones
and I hold off updating the kernel until its fixed.)

Having said that, there _was_ a recent commit to the pmap code that
could well have some bearing on your problem.  Check it out 8)

> Marc G. Fournier                                  scrappy@ki.net

-- 
]] Mike Smith, Software Engineer        msmith@atrad.adelaide.edu.au    [[
]] Genesis Software                     genesis@atrad.adelaide.edu.au   [[
]] High-speed data acquisition and      (GSM mobile) 0411-222-496       [[
]] realtime instrument control          (ph/fax)  +61-8-267-3039        [[
]] Collector of old Unix hardware.      "Where are your PEZ?" The Tick  [[



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604240259.MAA14238>