Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Jun 1996 16:49:13 +0300 (EET DST)
From:      "Andrew V. Stesin" <stesin@elvisti.kiev.ua>
To:        se@zpr.uni-koeln.de (Stefan Esser)
Cc:        hardware@freebsd.org, doc@freebsd.org
Subject:   Mystery has gone! Thanks! (How a non-obvious HW problem was solved)
Message-ID:  <199606031349.QAA09685@office.elvisti.kiev.ua>
In-Reply-To: <199606022214.AA22260@Sisyphos> from "Stefan Esser" at Jun 3, 96 00:14:33 am

next in thread | previous in thread | raw e-mail | index | archive | help
Dear Stefan and FreeBSD people,

it seems to me that I found a REAL solution to this. See below.

[... a configuration I'm talking about: ...]

# } A machine, our recently built firewall gateway to Internet,
# } is:
# } 	ATC-1425B mainboard, PCI, SiS 496/7 chipset;
# } 	16Mb RAM;
# } 	AMD 5x133 CPU;
# } 	NCR 53c810 SCSI;
# } 	1Gb Conner CFP1060S drive (recent, good one);
# } 	two modems on the onboard COMs (SLIP lines to the world);
# } 	1 Ethernet card.
# } 
# } 	OS:      FreeBSD-stable as of late March.
# } 	Add-ons: IPfilter 3.0.3+ (by Darren Reed) as in-kernel IP filtering
# } 		 facility, Squid 1.0beta7 WWW proxy cache daemon.
# } 
# } The machine was experiencing spontaneous reboots from time to time.
# } Either silent reboots, or prefaced with messages from NCR driver
# } (like "NCR dead?").

[... kind explanations and suggestions mostly omitted ...]

# The main difference is that the 21041 is a PCI bus-master.

	Yes, that's why I took it out -- my first guess was that
	this particular MB has some breakage in PCI
	implementation internals, which breaks busmastering PCI
	devices (isn't NCR a busmaster, too, btw?)
	Now I see I was wrong.

# There have been other motherboards that did not work correctly
# with multiple PCI bus masters, but I have no idea about the SiS
# chip set being broken in such a way.

	SiS 496/7 -based MBs are "the line of choise" for 486
	boards at our site. They're generally Ok -- not as fast
	as ASUS SP3G (I have some experience with those, too, but
	they dissapeared recently from stocks); they're stable and
	reliable.  We have some older SiS boards from SOYO,
	and ATC-1425B -s are from different vendor (some Taiwanese, too)
	and they do support AMD 5x133. Have also seen ASUS with
	SiS 496/7 (SP3), too -- I didn't liked them (only 2 RAM sockets,
	were unstable under FreeBSD, though people claims that it was due
	to ancient BIOS firmware).

	As for multiple busmasters in SiS boards... We had a
	4-ether router for a while, with: NCR, 2 'lnc' AMD PCI boards,
	and Realtek PCI NE2000 clone. All 4 PCI slots were full.
	Lance ethers are busmasters, supported by ISA driver
	(PCI NE2000 worked with ISA 'ed' driver). CPU was AMD dx2/80

	This monster was reliable and fast, but it threw couples
	of messages about failed DMA on lnc[01] and "NCR dead?" occasionally
	under peak loads.  But drivers performed hardware reset,
	and it worked for weeks this way.  Being a cautious person,
	I redesigned network layout recently :)  when Realtek PCI NE2000
	card died :-)))

	My experience tells me that SiS 496/7 boards are Ok,
	reasonably "old" and stable, but
	they do not enjoy overloading of their slots with peripherials.
	If you'll fill all ISA and PCI slots -- be ready to get
	spontaneous crashes and hardware troubles.  (Seen this on
	our UUCP mail host). Having at least one ISA and one PCI slot empty
	is Ok.

# Some systems did not work reliably with all PCI performance
# options enabled (e.g. PCI Burst Mode, Write Buffers, ...), and

	As I was told by hardware technical guys, these problems
	were pretty often half a year ago; recent revisions of BIOSes
	(Award, AMI) are improved and the problems (kinda of?) went away.

# I have seen other reports where a high interrupt load made the
# kernel fail with the PC pointing into the NCR driver. But I do
# not think this necessarliy points out a driver problem, since

	Your'e 101% right.

[...]
# I've been using the NCR and a DEC 21040 based Znyx 312 for some
# time in my ASUS SP3G system, and never had the kind of trouble 
# you see.

	Our "approved" kind of HW setup is: SiS496/7 based board,
	AMD 5x133 CPU, NCR 53c810, IBM SCSI drive(s),
	DEC 21040-based ether, any S3 868 video,
	other periph. to your taste, 16+ megs of RAM.
	Cheap, solid and productive; I highly recommend it.

# If your system currently got any performance options enabled, I'd 
# just try without them. Wait states added to memory and cache accesses 
# and PCI setup to work without burst transfers should help find a 
# possible hardware performance problem.

	The final solution which I found:  SIMMs weren't of appropriate
	quality!!! despite they were marked as 60ns!!!! WHAT A FSCK!!!!

	The DRAM chips on the SIMMs are Texas Instruments, detailed chip
	info available upon request (in case anyone interested).

	ATC-1425B has "Auto configuration" option in BIOS setup.
	"Huh, it should be a pretty safe kind of setup, if it
	puts ISA to 7.159MHz!" -- I thought initially :)  It was
	turned "on".

	After all kinds of fighting with PCI setup options
	(performance degrade -- but still crashes!) that's what I did
	two days ago:

	1. Turned "Auto config" in BIOS "off".
	2. ISA BUS clock -- put to 33MHz/4 -- it's appropriate.
	3. Added a _single_ (!) wait state to the BIOS timing
	   which manages transfers between L2 cache (btw L2 cache
	   is 15ns on ATC-1425B board) and main DRAM,
	   just changed it from 2 to 3.
	   (The machine is up now, if someone needs an exact spelling
	   of how this BIOS option is called -- ask).

	And -- YESS!!! the problem dissapeared! (The machine stood
	up bravely under flood pings and TCP shoots from 3(!) other
	FreeBSD boxes, and with disk activity artificially inspired --
	for 48 hours non-stop, previously just 5-10 minutes
	of stress killed it).

	The box is still up now, no more problems observed. (Probably I'll try
	to put Lance ether into it, just for experiment -- but
	I simply don't want to reboot it at all, it holds our Inet connection!)

		Thanks to all you friends who supported me!
		Please take my sincere apologies for taking your time!

	I hope my experience will be of some use for Hardware
	Compatibility Guide which is now in preparation, and
	people will benefit a bit from it.

-- 

	With best regards -- Andrew Stesin.

	+380 (44) 2760188	+380 (44) 2713457	+380 (44) 2713560

	"You may delegate authority, but not responsibility."
					Frank's Management Rule #1.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606031349.QAA09685>