Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Jun 95 19:45 CDT
From:      uhclem%nemesis@fw.ast.com (Frank Durda IV)
To:        hackers@freebsd.org
Subject:   2.0.5 caches, lockups and crashes - oh my!
Message-ID:  <m0sR7j9-0004vyC@nemesis.lonestar.org>

next in thread | raw e-mail | index | archive | help
[0]On Sun, 25 Jun 1995, Jan Isley wrote:
[0]The first try was just installing bin.  After printing xxxx blocks
[0]on debug window I got Fatal trap 12: page fault while in kernel mode.

[1]Somebody wrote:
[1]Sounds like hardware.  Try disabling the cache.

[2]Somebody wrote:
[2]What is the problem with FreeBSD and caches - I could not install, and 
[2]cannot rebuild a kernel with my cache turned on !  Turning it off is a 
[2]pretty easy fix - but I also take a pretty big performance hit.
[2]What does it mean if your cache is causing these problems ?

[3]From: "Rodney W. Grimes" <rgrimes@gndrsh.aac.dev.com>
[3]Date: Wed, 28 Jun 1995 01:40:53 -0700 (PDT)
[3]It means more than likely one of two things, either you have a marginal
[3]cache SRAM that when pounded on as hard as FreeBSD pounds on a cache
[3]it fails and corrupts data, or you have a bus master DMA cache coherency
[3]problem that fails to invalidate data in the cache.


Well, I have support for over ten different systems that all loved
FreeBSD 1.1.5.1, loved all the SNAPs, and simply will no run worth a
hoot on 2.0.5A or 2.0.5R.  I plan to try the post-2.0.5R SNAPs shortly,
but am not optimistic.

The original symptom (that was reportedly in these lists during 2.0.5A
testing) was that these systems (all of them made by different makers but
all 486 systems of various speeds and different cache designs and makers)
would ALL refuse to boot from the boot floppy if the cache was turned on.
But if you turn the cache off (or remove it in the case of Intel 485-T
Turbocache modules), you can boot 2.0.5 and even install successfully.

A considerable amount of reinstalling was done and various things were
tried and all we learned was that it was something to do with the
fact that the kernel was compressed that caused the boot to nuke.  The
uncompressed kernel.MFS could be booted (from hard disk) without incident.

Well, nothing really got done and 2.0.5R went out as it was.  So when
I install it on these Tandy, GRiD, DEC and AST systems, I have to shut
the cache off before the boot from floppy will work.

Now remember that all of these systems worked fine on earlier versions
of FreeBSD.  Some systems ran 1.1.5.1 and early SNAPs 24 hours a day for
several months with nothing more than the once every other week crash
(usually a power failure).  And this was with the cache enabled (and/or
installed).

Once installed, you could put the cache back in or turn it on and the
systems would boot and run from the HD OK.  Or so I thought at the time.

Then I and the other users got around to actually doing more with 2.0.5R
than just installing the system over and over again (and driving from
location to location to do this) and we found that although 2.0.5 would
boot and run with the cache enabled and/or installed (using the
uncompressed kernel on the hard disk), if you gave the system any
significant computing load, such as compiling something, it would lock-up
or crash.  The lock-up was most common.

I could do a make world, and it would go off and churn cleaning up old
files, making directories and such and look like things were going great.
But when it hit the first one or two compiles, the system would die.
Everytime.   If you compiled some other program in a completely different
part of the tree, the system would also die.

So I removed the cache on three of the systems again and re-ran the
make world.  20 to 30 hours later, they all completed the builds.   Great,
but nobody wants to touch these machines now because they run so slow.

Again, these systems used to routinely rebuilt SNAP systems and
1.1.5.1 systems with make world and did hundreds of kernel builds without
incident.

I find it impossible to believe that months of operation under 1.1.5.1
and the SNAPs didn't beat the system as hard as 2.0.5 does, and that
the cache hardware on all of these different systems is so sensitive and
selective that it can detect the presence of 2.0.5 vs SNAP 04xx and
earlier versions and crash accordingly.  A better explanation is needed.


I have been forced to put 1.1.5.1 back on a couple of systems since it
works with the cache enabled.  One client was annoyed enough to
suggest I install the "L"-system instead.   Ugh.


I still believe some significant problem was introduced in the blast
of major structural modifications that occurred between the 04xx snap
and 2.0.5A.  I have no hard suspects.  It may be the same problem that
causes compressed kernels to malfunction, but it may not be.

In fact, I am looking for a site that still has the 04xx SNAP online
who would be willing to let me FTP a copy so I can run my more performance-
critical systems with the cache enabled rather than go back to 1.1.5.1.

(In my haste to prepare for rapid testing of 2.0.5A, I wiped the copies all
 the SNAPs I had.  Anyone who can help me out on this, please EMAIL direct.
 Thanks.)

I would be happy to attempt a staged upgrade from 04xx to 2.0.5, to
try to determine what set of modifications are causing the problem,
particularly if someone would suggest groups of modifications that
go together or must be applied together.  This would be limited to
kernel changes only.     I would also run any other tests that would
help locate the true cause of these problems.

Thanks for any input on this.


					Frank Durda IV
					uhclem%nemesis@fw.ast.com

"What would you rather be running: 
    'FreeBSD',
or  'Bob-Pro' (aka Windows '95)?"   :-)
(C) (TM) 1995 FDIV




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m0sR7j9-0004vyC>