Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Aug 1995 04:20:02 +0300
From:      Heikki Suonsivu <hsu@cs.hut.fi>
To:        "Rashid Karimov." <rashid@haven.ios.com>
Cc:        freebsd-hackers@freefall.FreeBSD.org
Subject:   S.O.S -2.1Stable and ASUSP54TP4
Message-ID:  <199508290120.EAA08982@shadows.cs.hut.fi>
In-Reply-To: "Rashid Karimov."'s message of 28 Aug 1995 21:16:34 %2B0300

next in thread | raw e-mail | index | archive | help

	   system locks at random times w/o any messages at the console/
	   log files. Locks means the system becomes unreachable neither
	   from the local net nor from the console
	   After I hit "reboot" switch, system reboots up to the fsck
	   level and it starts complaining that it can't read partition
	   information off the second HDD ( Seagate Barracuda 4 Gb) (!).

	   If one hits "reboot" again and goes to the Adaptec BIOS and runs
	   disk utilities --> media check from there - the BIOS (!) complains
	   that it can not  talk to the second HD.

	   The problem goes away _only after powercycling the whole PC.
	   I never saw the stuff like this before ... any suggestions ?

What we see here:

One of the SCSI disks becomes unreachable: IBM 0662's say "Disk dribe is
becoming ready", often survies, Seagates lock up.  Usually we get IO
errors, panic, and the system gets stuck in SCSI BIOS probes (probably, it
says WAIT and sits there until reset, sometimes requiring several resets or
a power cycle).

Almost everything has been changed already, the whole system around it.
The only original things are the box (power supply) and IBM 0662 root disk,
the latter will be replaced by the end of the week.

It has been a P60 and P90, Buslogic and cheap NCR controllers have been
tried out.  Currently it has two NCR's, one with 0662 and 4G seagate hawk
and another with 1G seagate hawk.


I had a seagate barracuda in the system, and it gave similar problems.

	   Everything is fine till you don;t have too much activity going
	   on system. Some of the servers I have here run for months w./o
	   problems - but they do DNS/WWW/INN stuff.
	   As soon as you put 3000- 4000 users on the system - that when
	   the shit begins.

The problem is clearly load related, we get about one lockup a day, when it
is getting a news feed in.  If the news feed is dead, it seems to stay up
fine.

	   Till now _the most stable version is SNAP back from Feb 95.
	   It is up for 24 days, runs 4000 account, 50-70 users online.
	   Bad things about it :
	   no support for 2940,SMC EtherPower and QUOTAs don't work.

I'm still suspecting the hardware I have, but after replacing the last
original component I'm running out of ideas.  But it certainly should not
hang in BIOS probes (assuming that Buslogic & NCR did their code right).
This has been around since spring, at least.

I also could find certain sequences of disk accesses which killed the
machine repeatably.  When I switched the 2G barracuda to a 4G hawk and
copies news spool over, it always hung on certain files when tarring; I
tarred the files before the place it hung separately, removed the copied
files, and rerun the tar starting at the last hung; now it got past it.
Another case I had when trying to install an application which created dbm
indexes; it always hung the system when creating the indexes.  So it seems
that certain sequences of disk accesses kill the SCSI.

Maybe seagate did something wrong in their disks?  Tagged queuing?  When it
came around? 

-- 
Heikki Suonsivu, T{ysikuu 10 C 83/02210 Espoo/FINLAND,
hsu@cs.hut.fi  home +358-0-8031121 work -4513377 fax -4555276  riippu SN



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199508290120.EAA08982>