Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Oct 1998 13:47:39 +0200
From:      Neil Blakey-Milner <nbm@rucus.ru.ac.za>
To:        current@FreeBSD.ORG
Subject:   Some SCSI(?) problems whilst running SMP
Message-ID:  <19981013134739.A26388@rucus.ru.ac.za>

next in thread | raw e-mail | index | archive | help
Hi

We're in the unfortunate position of having to run a non-SMP kernel on our
dual-processor machine, due to the following:

uname -a: (of the non-SMP kernel)

FreeBSD rucus.ru.ac.za 3.0-BETA FreeBSD 3.0-BETA #0: Mon Oct  5 04:49:12 SAT 1998     nbm@rucus.ru.ac.za:/usr/src/sys/compile/RUCUS  i386

Sources are from October 4th, without softupdates, devfs, and other fun
things.

//---------------------------------------------------------------------
/usr/src/sys/i386/conf/RUCUS:
# SMP-GENERIC -- Smp machine with WD/AHx/NCR/BTx family disks
#
# For more information read the handbook part System Administration -> 
# Configuring the FreeBSD Kernel -> The Configuration File. 
# The handbook is available in /usr/share/doc/handbook or online as
# latest version from the FreeBSD World Wide Web server 
# <URL:http://www.FreeBSD.ORG/>;
#
# An exhaustive list of options and more detailed explanations of the 
# device lines is present in the ./LINT configuration file. If you are 
# in doubt as to the purpose or necessity of a line, check first in LINT.
#
#	$Id: SMP-GENERIC,v 1.16 1998/09/25 17:34:48 peter Exp $

machine		"i386"
# SMP does NOT support 386/486 CPUs.
#cpu		"I386_CPU"
#cpu		"I486_CPU"

cpu		"I586_CPU"
cpu		"I686_CPU"
ident		GENERIC
maxusers	256

# Create a SMP capable kernel (mandatory options):
#options		SMP			# Symmetric MultiProcessor Kernel
#options		APIC_IO			# Symmetric (APIC) I/O

options         "MAXDSIZ=(512*1048576)"	# Max allowed size of process
options         "DFLDSIZ=(256*1048576)"	# Default Max size of process

# Optional, these are the defaults:
#options		NCPU=2			# number of CPUs
#options		NBUS=4			# number of busses
#options		NAPIC=1			# number of IO APICs
#options		NINTR=24		# number of INTs

# Lets always enable the kernel debugger for SMP.
#options		DDB



# SMP shouldn't need x87 emulation, disable by default.
#options		MATH_EMULATE		#Support for x87 emulation

options		INET			#InterNETworking
options		FFS			#Berkeley Fast Filesystem
options		NFS			#Network Filesystem
#options		MSDOSFS			#MSDOS Filesystem
options		"CD9660"		#ISO 9660 Filesystem
options		PROCFS			#Process filesystem
options		"COMPAT_43"		#Compatible with BSD 4.3 [KEEP THIS!]
options		SCSI_DELAY=15000	#Be pessimistic about Joe SCSI device
options		UCONSOLE		#Allow users to grab the console
options		FAILSAFE		#Be conservative
options		USERCONFIG		#boot -c editor
options		VISUAL_USERCONFIG	#visual boot -c editor
options		INCLUDE_CONFIG_FILE
options		"MD5"

options		IPFIREWALL
options		IPFIREWALL_VERBOSE
options		QUOTA

config		kernel	root on wd0

controller	isa0
controller	eisa0
controller	pci0

controller	fdc0	at isa? port "IO_FD1" bio irq 6 drq 2 vector fdintr
disk		fd0	at fdc0 drive 0
disk		fd1	at fdc0 drive 1
# Unless you know very well what you're doing, leave ft0 at drive 2, or
# remove the line entirely if you don't need it.  Trying to configure
# it on another unit might cause surprises, see PR kern/7176.
tape		ft0	at fdc0 drive 2

options		"CMD640"	# work around CMD640 chip deficiency
controller	wdc0	at isa? port "IO_WD1" bio irq 14 vector wdintr
disk		wd0	at wdc0 drive 0
disk		wd1	at wdc0 drive 1

controller	wdc1	at isa? port "IO_WD2" bio irq 15 vector wdintr
disk		wd2	at wdc1 drive 0
disk		wd3	at wdc1 drive 1

options		ATAPI		#Enable ATAPI support for IDE bus
options		ATAPI_STATIC	#Don't do it as an LKM
device		wcd0	#IDE CD-ROM

# A single entry for any of these controllers (ncr, ahb, ahc, amd) is
# sufficient for any number of installed devices.
#controller	ncr0
#controller	amd0
#controller	ahb0
controller	ahc0
#controller	isp0

options		AHC_ALLOW_MEMIO

# This controller offers a number of configuration options, too many to
# document here  - see the LINT file in this directory and look up the
# dpt0 entry there for much fuller documentation on this.  The options
# line following dpt0 here is also currently a *required* option for it.
# controller      dpt0
# options DPT_MEASURE_PERFORMANCE

#controller	adv0	at isa? port ? cam irq ?
#controller	bt0	at isa? port ? cam irq ?
#controller	aha0	at isa? port ? cam irq ?
#controller	uha0	at isa? port "IO_UHA0" bio irq ? drq 5 vector uhaintr
#controller	aic0	at isa? port 0x340 bio irq 11 vector aicintr
#controller	nca0	at isa? port 0x1f88 bio irq 10 vector ncaintr
#controller	nca1	at isa? port 0x350 bio irq 5 vector ncaintr
#controller	sea0	at isa? bio irq 5 iomem 0xc8000 iosiz 0x2000 vector seaintr

controller	scbus0

device		da0

device		sa0

device		pass0

device		cd0	#Only need one of these, the code dynamically grows

device		wt0	at isa? port 0x300 bio irq 5 drq 1 vector wtintr
device		mcd0	at isa? port 0x300 bio irq 10 vector mcdintr

controller	matcd0	at isa? port 0x230 bio

device		scd0	at isa? port 0x230 bio

#options		PNP

# syscons is the default console driver, resembling an SCO console
device		sc0	at isa? port "IO_KBD" tty irq 1 vector scintr
# Enable this and PCVT_FREEBSD for pcvt vt220 compatible console driver
#device		vt0	at isa? port "IO_KBD" tty irq 1 vector pcrint
#options		XSERVER			# include code for XFree86
#options		FAT_CURSOR		# start with block cursor
# If you have a ThinkPAD, uncomment this along with the rest of the PCVT lines
#options		PCVT_SCANSET=2		# IBM keyboards are non-std
options		MAXCONS=16
#options		SC_DISABLE_REBOOT

device		npx0	at isa? port "IO_NPX" irq 13 vector npxintr

#
# Laptop support (see LINT for more options)
#
device		apm0    at isa?	disable	flags 0x31 # Advanced Power Management

# PCCARD (PCMCIA) support
#controller	card0
#device		pcic0	at card?
#device		pcic1	at card?

device		sio0	at isa? port "IO_COM1" flags 0x10 tty irq 4 vector siointr
device		sio1	at isa? port "IO_COM2" tty irq 3 vector siointr
device		sio2	at isa? disable port "IO_COM3" tty irq 5 vector siointr
device		sio3	at isa? disable port "IO_COM4" tty irq 9 vector siointr

device		lpt0	at isa? port? tty irq 7 vector lptintr
device		lpt1	at isa? port? tty
device		mse0	at isa? port 0x23c tty irq 5 vector mseintr

device		psm0	at isa? disable port "IO_KBD" conflicts tty irq 12 vector psmintr

# Order is important here due to intrusive probes, do *not* alphabetize
# this list of network interfaces until the probes have been fixed.
# Right now it appears that the ie0 must be probed before ep0. See
# revision 1.20 of this file.
#device de0
#device fxp0
#device tl0
device tx0
#device vx0
#device xl0

device ed0 at isa? port 0x300 net irq  3 iomem 0xd8000 vector edintr

#device ie0 at isa? port 0x300 net irq 10 iomem 0xd0000 vector ieintr
#device ep0 at isa? port 0x300 net irq 10 vector epintr
#device ex0 at isa? port? net irq? vector exintr
#device fe0 at isa? port 0x300 net irq ? vector feintr
#device le0 at isa? port 0x300 net irq 5 iomem 0xd0000 vector le_intr
#device lnc0 at isa? port 0x280 net irq 10 drq 0 vector lncintr
#device ze0 at isa? port 0x300 net irq 5 iomem 0xd8000 vector zeintr
#device zp0 at isa? port 0x300 net irq 10 iomem 0xd8000 vector zpintr
#device cs0 at isa? port 0x300 net irq ? vector csintr

pseudo-device	loop
pseudo-device	ether
pseudo-device	sl	1
#pseudo-device	ppp	1
pseudo-device	tun	4
pseudo-device	pty	256
pseudo-device	gzip		# Exec gzipped a.out's

# KTRACE enables the system-call tracing facility ktrace(2).
# This adds 4 KB bloat to your kernel, and slightly increases
# the costs of each syscall.
options		KTRACE		#kernel tracing

# This provides support for System V shared memory.
#
options         SYSVSHM  
options         SYSVSEM
options         SYSVMSG  
//--------------------------------------------------------------

diff RUCUS RUCUS-SMP
//--------------------------------------------------------------
23c23
< ident         GENERIC
---
> ident         SMP-GENERIC
27,28c27,28
< #options              SMP                     # Symmetric MultiProcessor
Kernel
< #options              APIC_IO                 # Symmetric (APIC) I/O
---
> options               SMP                     # Symmetric MultiProcessor
> Kernel
> options               APIC_IO                 # Symmetric (APIC) I/O
40c40
< #options              DDB
---
> options               DDB
//---------------------------------------------------------------

/var/run/dmesg.boot:
//---------------------------------------------------------------
Copyright (c) 1992-1998 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California. All rights reserved.
FreeBSD 3.0-BETA #0: Mon Oct  5 04:49:12 SAT 1998
    nbm@rucus.ru.ac.za:/usr/src/sys/compile/RUCUS
Timecounter "i8254"  frequency 1193182 Hz  cost 2170 ns
CPU: Pentium/P54C (200.46-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x52c  Stepping=12
  Features=0x3bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,APIC>
real memory  = 134217728 (131072K bytes)
avail memory = 127762432 (124768K bytes)
Probing for devices on PCI bus 0:
chip0: <Intel 82439> rev 0x03 on pci0.0.0
chip1: <Intel 82371SB PCI to ISA bridge> rev 0x01 on pci0.7.0
ide_pci0: <Intel PIIX3 Bus-master IDE controller> rev 0x00 on pci0.7.1
ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 11 on pci0.12.0
ahc0: Using left over BIOS settings
ahc0: aic7880 Wide Channel A, SCSI Id=5, 16/255 SCBs
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
ed0 at 0x300-0x31f irq 3 on isa
ed0: address 00:00:e8:1c:7b:57, type NE2000 (16 bit) 
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1 not found at 0x2f8
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
lpt1 not found
mse0 not found at 0x23c
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <WDC AC33100H>
wd0: 3020MB (6185088 sectors), 6136 cyls, 16 heads, 63 S/T, 512 B/S
wdc1 not found at 0x170
wt0 not probed due to I/O address conflict with ed0 at 0x300
mcd0 not probed due to I/O address conflict with ed0 at 0x300
matcdc0 not found at 0x230
scd0 not found at 0x230
npx0 on motherboard
npx0: INT 16 interface
Intel Pentium F00F detected, installing workaround
IP packet filtering initialized, divert disabled, rule-based forwarding disabled, unlimited logging
Sending WDTR!
(probe2:ahc0:0:2:0): Sending SDTR!!
sa0 at ahc0 bus 0 target 1 lun 0
sa0: <SONY SDT-7000 0195> Removable Sequential Access SCSI2 device 
sa0: 10.0MB/s transfers (10.0MHz, offset 15)
changing root device to da1s1a
da1 at ahc0 bus 0 target 6 lun 0
da1: <IBM DORS-32160 S82C> Fixed Direct Access SCSI2 device 
da1: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled
da1: 2063MB (4226725 512 byte sectors: 64H 32S/T 2063C)
da0 at ahc0 bus 0 target 2 lun 0
da0: <WDIGTL ENTERPRISE 1.70> Fixed Direct Access SCSI2 device 
da0: 40.0MB/s transfers (20.0MHz, offset 8, 16bit)
da0: 4157MB (8515173 512 byte sectors: 64H 32S/T 4157C)
//---------------------------------------------------------------

Ok, the problem is this:

When we enable SMP support, within any time from an hour to 6 days, we will
die with SCSI errors - of late "SCB timeout handled by another timeout" I
think is the proferred explanation.  The "death" seems to occur quickly after
extensive access to the disks, but it also just dies arbitrarily, usually
after the machine has been up for a few days.  It doesn't seem to be specific
to any drive failing either. (we've swapped drives around, etc)

("die" is a technical term here meaning either a reboot just after a SCSI
error pops up for a few seconds, or just hangs after a SCSI error pops up.)

The motherboard is a GigaByte GA586DX with onboard AIC7880 SCSI controller.
The BIOS has been updated from 1.0 to 3.43 to no avail.  I'm looking for
anything to do with the SCSI controller too, but nothing seems to be out
there.

We have both 16bit and 8bit devices on it, and is terminated correctly
according to both the motherboard manual, and tons of testing. (terminator on
last device on each SCSI connection, and high-bit termination on and low-bit
termination off on the motherboard).

These errors have been occuring much more often recently, happening only
occasionally about a year ago, and now happening _extremely_ fast (usually
within 4 days, sometimes a whole week) if we have SMP enabled.  We've yet to 
have the same problem without SMP though.

I realize that it's incredibly likely to be the hardware, I was just hoping
that I'm wrong in this regard, since we're stuck with this hardware for a
few months, and we're kinda used to having huge uptimes, CPU power, and 
similar things, compared to the Microsoft house that is the Information 
Systems department.

Anyway, any and all help would be appreciated.  (although I'll understand if
everyone ignores this for a few days whilst furious coding occurs on the new
release)

Neil
-- 
Neil Blakey-Milner
nbm@rucus.ru.ac.za

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19981013134739.A26388>