From owner-freebsd-questions Tue Jun 18 16:03:13 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id QAA07053 for questions-outgoing; Tue, 18 Jun 1996 16:03:13 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id QAA07039; Tue, 18 Jun 1996 16:03:08 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id PAA11374; Tue, 18 Jun 1996 15:59:17 -0700 From: Terry Lambert Message-Id: <199606182259.PAA11374@phaeton.artisoft.com> Subject: Re: FreeBSD works with Cy486DLC processors? To: Eloy.Paris@ven.ra.rockwell.com (Eloy A. Paris) Date: Tue, 18 Jun 1996 15:59:16 -0700 (MST) Cc: michaelv@HeadCandy.com, hardware@freebsd.org, questions@freebsd.org In-Reply-To: <2.2.16.19960618093049.301ff69c@zeus.ven.ra.rockwell.com> from "Eloy A. Paris" at Jun 18, 96 09:13:02 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-questions@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > Well, I do not have an external cache. I was talking about the internal > cache. That one is disabled. But anyway, what does the internal cache have > to do with the mother board design? That is internal to the processor... The > external cache is also disabled because I don't have one. > > I have not tried anything besides FreeBSD on this machine. When you do bus mastering DMA from a controller card (ethernet, video capture, or disk), the controller directly writes to system memory rather than using the processor to copy data from memory on the card to your system memory. This offloads I/O processing to the card, freeing the CPU to run code, do calculations, etc.. This is why DMA controllers are better than programmed I/O controllers, and is one of the fundamental reasons UNIX people are so pro SCSI and so anti-IDE. Yes, there are some IDE controller that do DMA, but they are nearly impossible to detect without crashing those that don't do DMA, and so you can't build generic code for them without serious manufacturer documentation, like ROM recognition algorithms. When you have a cache (internal or external), then the processor fetches data from the cache instead of hitting the memory bus (internal) or from the cache instead of potentially hitting I/O wait states associated with real memory (external). Because the processor will prefer these caches, when you DMA, you have to tell the cache controller circuitry (for an external cache) that the cache is out of date, or the processor (for an internal cache) that the cache is out of date. Really, what you are telling it is "this region of memory has changed", and it decides what to do from there. The best cache hardware will say "I'd better reload that data from memory". This is called "cache update". Working cache hardware that isn't quite the best will say "I'd better throw the data away, and if it's referenced again (which it will be, when the I/O completion interrupt triggers running the code that asked for the read in the first place), I'll reload it". This is called "cache invalidation". There are motherboards, which can't be simply identified by manufacturer, since manufacturers revise them without telling anyone, that do not have working cache circuitry. This can be an inherent design flaw (like the original Saturn I, Neptune, and Mercury PCI chipsets from Intel, sold in early Gateway and Dell 60/66 MHz P5 systems) in the chipset, or it can be a motherboard design flaw (it's just not hooked up), or it can be a "feature": VESA Local Bus [VLB] systems won't signal a DMA to the cache logic unless the VESA controller is in a "master" slot. Which slot, if any, is a "master" is undocumented; this is one of the fundamental reasons UNIX people are so violently anti-VESA/VLB. A motherboard flaw can affect the external cache only, the internal cache only, or both. If both are affected, you are using a VESA/VLB card, a PCI card with a flawed chipset, or the motherboard is simply broken. "The Cyrix problem" is actually a Cyrix/TI problem. Cyrix and TI built a chip with an internal cache. Cyrix has since signed a deal with IBM to use their "blue lightening" chip masks, and chips build using the IBM design (Cyrix is still selling chips built using the old design) don't have the problem. Of course, it's hard to tell which chips have the problem, other than "it is a Cyrix/TI chip" or "it isn't a Cyrix/TI chip". "The Cyrix problem" has two stories: the first is that the chip was built to be plugged into 386 slots, and 386's don't have a cache change notification pin. So there is absolutely no way to tell the chip that the memory it has cached is no good. Even if there was, the motherboard doesn't expect to have the ability to tell the processor that the cache is invalid, so there is no circuitry, even if there was a pin. The second story is that the chips don't honor the "non-cacheable" bit, so even if you write Cyrix-knowledgable routines, you are still screwed. What can BSD do? 1) The Cyrix chips are detectable, and you can turn the internal cache off: "The Undocumented PC", Frank Van Gilluwe, Addison Wesley publishing, ISBN 0-201-62277-7 If you do this, you will lose performance, but anything that works is better than anything that doesn't. 2) If the chip supports it, you can mark the pages non-cacheable, or, alternately, manually invalidate them on DMA completion using BINVD. This assumes you can detect the cache failure, so you know to turn the code on (you don't want it on all the time, since it reduces performance). 3) Also assuming DMA failure is somehow detectable, you can have two versions of every driver so you can use PIO instead of DMA. This is a *huge* performance hit in many cases. 4) You can replace the faulty hardware. Generally, soloutions 1-3 have not been implemented. Linux implements soloution 1. Solutions 2 and 3 require detecting DMA hardware, then forcing a situation where the DMA failure is detectable -- not an easy thing in all cases, or with all type of hardware. For small caches, it means adding a "frob" to the driver in case it's the only DMA device in the machine, and using the frob to avoid LRU'ing the test data out of the smallest possible L1 and L2 caches that you might want to test. Most people will recommend solution 4, rather than track down the hardware errors. The problem with implementing any of 1-3 is that it would take knowledgable people intentionally buying broken hardware for it to be possible to test -- and that just isn't going to happen for most of us, who can barely afford good hardware, let alone blow money on broken hardware for testing. Hope this clears things up. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.