Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 10 Mar 1996 12:21:10 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        imb@scgt.oz.au (michael butler)
Cc:        rgrimes@GndRsh.aac.dev.com, current@FreeBSD.org
Subject:   Re: AMD doesn't like SNAP! (panic: unwire: page not in pmap)
Message-ID:  <199603101921.MAA01621@phaeton.artisoft.com>
In-Reply-To: <199603101425.BAA02483@asstdc.scgt.oz.au> from "michael butler" at Mar 11, 96 01:25:33 am

next in thread | previous in thread | raw e-mail | index | archive | help
> I note that through a number of drivers there is mention of
> cache-invalidation instructions (software-style) but none of them seem to
> implement anything of this nature. Is there a problem with doing this ..
> that is .. to invalidate the page(s) into which data has just been
> transferred prior to the application being told that the transfer completed? 
> No other cases need to be considered do they ?

OK.  This is an interesting topic.  8-(.

Before, when I suggested that a DMA be triggered as part of a probe
process on a per controller basis to determine if bounce buffers
were necessary, I neglected the non-working cache case.  The cache
cases need to be considered at the same time because they need to
trigger similar controller/memory events in order to be detectable.

When a bus master DMA occurs, there is supposed to be a cache
notification, and the L1 and L2 cache lines are supposed to be
invalidated or written back for the memory range in which the DMA
took place.


It's possible to fail the L1 cache invalidate/write back, the L2, or
both.

The old Cyrix/TI 386 processors using the Cyrix chip mask (the newer
Cyrix parts -- not sure about TI -- use a licensed version of the IBM
"Blue Lightening" masks) had an L1 cache without implementing a cache
notification mechanism.  Because of this, if the L1 cache is enabled
on these chips (it is disabled by default and must be explicitly
enabled by software -- usually BIOS POST based on CMOS settings),
they will potentially have stale data hanging around after a bus
mastering DMA.  These chips are detectable ("The Undocumented PC"),
and the cache is software disableable (contravening the user
preferences -- some might argue that it is a driver problem, and
the driver should explicitly BINVD instead).

For the L2, the Saturn I chipset (mask date pre-April 1994) had a
flaw, where the DMA notification from PCI was simply not internally
connected to anything.  This is most often seen in Gateway and Dell
systems with 60MHz Pentiums, but they aren't the only ones who used
Saturn I's, so they aren't the only machines with problems.

Finally, VLB systems frequently do not identify "master" slots.  A
"master" slot is one where cache notification occurs after a bus master
DMA takes place.


Now, these aren't the *only* cases, *but* they are tha majority of
cases where "turn of the cache" will fix the problem.


Detecting failed cache update is tricky-- mostly because a small
amount of cache is involved, and you can't tell the difference
between data that was correctly invalidated, and data that was
invalidated to load in your test code, data, etc..

Basically, you have to fully set up a DMA into an area, but not trigger
it, and the modify a small part of the area to force it into cache
with a value other than the one that will result from the DMA.  Then
you trigger the DMA on a small enough operation that it doesn't
cause the cache line for the invalid data to be flushed (you test this
by doing the DMA to an area other than the cache area and use instruction
timing to determine if the data is still in cache or if the operation
blew it out -- a tricky operation).

To avoid cache effects in determining the data that will result,
you have to do BINVD's (the software cache flush) during the setup
to the point that you do the cache load... not the least because
you won't be able to load the test if there is a cache problm and you
use a DMA-using driver.


It's arguable whether BINVD'ing all your I/O is worthwhile -- you may
not gain any significant benefit from the cache otherwise to compensate
for the overhead; probably there will be *some* gain, but it will be
marginal.


One very real problem is that the people hacking the code areas that
would need to change simply don't buy the cheap hardware necessary to
reproduce the problem and allow them to test.


Finally, for the less likely cases of a flaw in the motherboard L2 cache
implementation (which may include you not being able to successfully
BINVD the area and work around the bug in software), there is no fix
except disabling the L2 cache.  8-(.


Because of the need for a DMA card and a driver to use it, it's no wonder
that this type of testing has not made it into a consumer "hardware test"
application that you can use in a store before buying the hardware.


It isn't worthwhile to BINVD in the majority of cases, because the
majority is people with functional cache hardware.  Admittedly, we
have self-selected this by not "fixing" the problem in software for
people without the good hardware.

A decent fix would require a lot of effort and a lot of hooks to make
it work in all cases (for instance, I could have two VLB controllers,
one in a "master" slot, one not; the fix has to have a per controller
granularity, etc.).  8-(.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199603101921.MAA01621>