From owner-freebsd-questions  Tue Jun 18 16:03:13 1996
Return-Path: owner-questions
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id QAA07053
          for questions-outgoing; Tue, 18 Jun 1996 16:03:13 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id QAA07039;
          Tue, 18 Jun 1996 16:03:08 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id PAA11374; Tue, 18 Jun 1996 15:59:17 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199606182259.PAA11374@phaeton.artisoft.com>
Subject: Re: FreeBSD works with Cy486DLC processors?
To: Eloy.Paris@ven.ra.rockwell.com (Eloy A. Paris)
Date: Tue, 18 Jun 1996 15:59:16 -0700 (MST)
Cc: michaelv@HeadCandy.com, hardware@freebsd.org, questions@freebsd.org
In-Reply-To: <2.2.16.19960618093049.301ff69c@zeus.ven.ra.rockwell.com> from "Eloy A. Paris" at Jun 18, 96 09:13:02 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-questions@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> Well, I do not have an external cache. I was talking about the internal
> cache. That one is disabled. But anyway, what does the internal cache have
> to do with the mother board design? That is internal to the processor... The
> external cache is also disabled because I don't have one.
> 
> I have not tried anything besides FreeBSD on this machine.

When you do bus mastering DMA from a controller card (ethernet, video
capture, or disk), the controller directly writes to system memory
rather than using the processor to copy data from memory on the card
to your system memory.

This offloads I/O processing to the card, freeing the CPU to run
code, do calculations, etc..

This is why DMA controllers are better than programmed I/O controllers,
and is one of the fundamental reasons UNIX people are so pro SCSI and
so anti-IDE.  Yes, there are some IDE controller that do DMA, but
they are nearly impossible to detect without crashing those that don't
do DMA, and so you can't build generic code for them without serious
manufacturer documentation, like ROM recognition algorithms.


When you have a cache (internal or external), then the processor
fetches data from the cache instead of hitting the memory bus
(internal) or from the cache instead of potentially hitting I/O
wait states associated with real memory (external).


Because the processor will prefer these caches, when you DMA, you
have to tell the cache controller circuitry (for an external cache)
that the cache is out of date, or the processor (for an internal
cache) that the cache is out of date.  Really, what you are telling
it is "this region of memory has changed", and it decides what to
do from there.


The best cache hardware will say "I'd better reload that data from
memory".  This is called "cache update".

Working cache hardware that isn't quite the best will say "I'd better
throw the data away, and if it's referenced again (which it will be,
when the I/O completion interrupt triggers running the code that asked
for the read in the first place), I'll reload it".  This is called
"cache invalidation".


There are motherboards, which can't be simply identified by manufacturer,
since manufacturers revise them without telling anyone, that do not
have working cache circuitry.  This can be an inherent design flaw
(like the original Saturn I, Neptune, and Mercury PCI chipsets from
Intel, sold in early Gateway and Dell 60/66 MHz P5 systems) in the
chipset, or it can be a motherboard design flaw (it's just not hooked
up), or it can be a "feature": VESA Local Bus [VLB] systems won't
signal a DMA to the cache logic unless the VESA controller is in
a "master" slot.  Which slot, if any, is a "master" is undocumented;
this is one of the fundamental reasons UNIX people are so violently
anti-VESA/VLB.

A motherboard flaw can affect the external cache only, the internal
cache only, or both.  If both are affected, you are using a VESA/VLB
card, a PCI card with a flawed chipset, or the motherboard is simply
broken.


"The Cyrix problem" is actually a Cyrix/TI problem.  Cyrix and TI
built a chip with an internal cache.  Cyrix has since signed a
deal with IBM to use their "blue lightening" chip masks, and
chips build using the IBM design (Cyrix is still selling chips
built using the old design) don't have the problem.  Of course,
it's hard to tell which chips have the problem, other than "it
is a Cyrix/TI chip" or "it isn't a Cyrix/TI chip".

"The Cyrix problem" has two stories: the first is that the chip
was built to be plugged into 386 slots, and 386's don't have a
cache change notification pin.  So there is absolutely no way
to tell the chip that the memory it has cached is no good.  Even
if there was, the motherboard doesn't expect to have the ability
to tell the processor that the cache is invalid, so there is
no circuitry, even if there was a pin.

The second story is that the chips don't honor the "non-cacheable"
bit, so even if you write Cyrix-knowledgable routines, you are
still screwed.


What can BSD do?

1)	The Cyrix chips are detectable, and you can turn the internal
	cache off:

		"The Undocumented PC", Frank Van Gilluwe, Addison
		Wesley publishing, ISBN 0-201-62277-7

	If you do this, you will lose performance, but anything that
	works is better than anything that doesn't.

2)	If the chip supports it, you can mark the pages non-cacheable,
	or, alternately, manually invalidate them on DMA completion
	using BINVD.  This assumes you can detect the cache failure,
	so you know to turn the code on (you don't want it on all
	the time, since it reduces performance).

3)	Also assuming DMA failure is somehow detectable, you can
	have two versions of every driver so you can use PIO instead
	of DMA.  This is a *huge* performance hit in many cases.

4)	You can replace the faulty hardware.

Generally, soloutions 1-3 have not been implemented.  Linux implements
soloution 1.  Solutions 2 and 3 require detecting DMA hardware, then
forcing a situation where the DMA failure is detectable -- not an
easy thing in all cases, or with all type of hardware.  For small
caches, it means adding a "frob" to the driver in case it's the
only DMA device in the machine, and using the frob to avoid LRU'ing
the test data out of the smallest possible L1 and L2 caches that
you might want to test.

Most people will recommend solution 4, rather than track down the
hardware errors.  The problem with implementing any of 1-3 is that
it would take knowledgable people intentionally buying broken hardware
for it to be possible to test -- and that just isn't going to happen
for most of us, who can barely afford good hardware, let alone blow
money on broken hardware for testing.

Hope this clears things up.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.