Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Nov 1999 10:33:07 -0600 (CST)
From:      Joe Greco <jgreco@ns.sol.net>
To:        ken@kdm.org (Kenneth D. Merry)
Cc:        dgilbert@velocet.ca, stable@freebsd.org
Subject:   Re: ahc problems (with vinum?)
Message-ID:  <199911281633.KAA55332@aurora.sol.net>
In-Reply-To: <199911280515.WAA19138_panzer.kdm.org@ns.sol.net> from "Kenneth D. Merry" at "Nov 28, 1999  5:16: 2 am"

next in thread | previous in thread | raw e-mail | index | archive | help
> David Gilbert wrote...
> > >>>>> "Kenneth" =3D=3D Kenneth D Merry <ken@kdm.org> writes:
> > Kenneth> David Gilbert wrote...
> > >> Several times, on a system I've been configuring and testing, I've
> > >> got some maddening ahc0 messages.  In general, they complain of a
> > >> timeout on the bus (I think some packet got lost)... and x SCBs are
> > >> aborted.
> > >>=20
> > >> At this point, some portion of the SCSI bus is unusable... and the
> > >> machine eventually hangs due to this.  It does claim that it's
> > >> resetting channel A of the ahc0 controller, but I gather it doesn't
> > >> do any good.
> > >>=20
> > >> I'm running 3.3-STABLE (as of thursday, I think) and am trying to
> > >> format and test an 8-drive vinum RAID-5 array.
> >=20
> > Kenneth> You'll need to provide more information in order for anyone
> > Kenneth> to make sense of your problem.  Specifically, please post any
> > Kenneth> and all relevant kernel messages, including your controllers
> > Kenneth> and drives and the errors you've seen printed out, explain
> > Kenneth> your SCSI bus configuration, where it is terminated, etc.
> >=20
> > Kenneth> The #1 cause of problems is cabling and termination.  The
> > Kenneth> second most common cause of problems is bogus drive firmware.
> >=20
> > Kenneth> In any case, check your cabling and termination, as that is
> > Kenneth> most likely problem.
> >=20
> > Regardless of terminaion, the SCSI bus reset should clear
> > things... the unit will run for hours just fine... get this one error
> > and hang.  It is difficult to copy down all the messages --- as they
> > don't get copied into the logs (since the SCSI bus is locked).
>=20
> Run a serial console on the box.  You'll get all the messages that way.
> Seriously, there's no way to adequately diagnose the problem without the
> specific error messages in question.  There are any number of conditions
> that can cause a timeout.
>=20
> > The controller is the 2940 U2W --- the one with a SE and an LVD
> > connector.  The LVD bus is connected to a professional 8 drive LVD
> > case which is connected and terminated with the supplied cables.  The
> > SE connector is connected to a single drive.
>=20
> And the SE drive is terminated as well?  Are the supplied cables and
> terminator for the LVD segment LVD cables/terminators?

Ken,

Just having spent a week debugging a (very) intermittent SCSI bus problem,
I agree that I've seen some odd behaviour of this sort.  What's even more
exasperating is that, at least in some cases, it does appear to recover
the one device that erred, but the rest stop functioning.

I've got serial consoles on my machines, let me see if I can dig up...

/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=
=08|=08/=08-=08Console: serial port
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS drive D: is disk2
BIOS drive E: is disk3
BIOS drive F: is disk4
BIOS drive G: is disk5
BIOS drive H: is disk6
BIOS drive I: is disk7
BIOS drive J: is disk8

FreeBSD/i386 bootstrap loader, Revision 0.7  640/65472kB
(jkh@highwing.cdrom.com, Thu Sep 16 22:16:41 GMT 1999)
|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=
=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08=
/=08-=08\=08|=08/=08Loading /boot/defaults/loader.conf=20
-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=
=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/kernel =
text=3D0x10a418 /=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=
=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08=
-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=
=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08=
\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08=
-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=
=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08=
\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08data=3D0x17=
b48+0x1a97c \=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08syms=3D[0x4+0x1=
ee30\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08+0x4+0x=
206b3\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08]
\=08|=08/=08-=08\=08|=08/=08
Hit [Enter] to boot immediately, or any other key for command prompt.

Type '?' for a list of commands, 'help' for more detailed help.
boot: host > boot -s
Copyright (c) 1992-1999 FreeBSD Inc.
Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California. All rights reserved.
FreeBSD 3.3-RELEASE #0: Mon Nov 22 13:38:07 CST 1999
    root@host:/usr/src/sys/compile/DEMO
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium II/Xeon/Celeron (686-class CPU)
  Origin =3D "GenuineIntel"  Id =3D 0x652  Stepping =3D 2
  Features=3D0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE=
,MCA,CMOV,PAT,PSE36,MMX,FXSR>
real memory  =3D 536870912 (524288K bytes)
avail memory =3D 519716864 (507536K bytes)
Programming 24 pins in IOAPIC #0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  1, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  0, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc027e000.
Pentium Pro MTRR support enabled
Probing for devices on PCI bus 0:
chip0: <Intel 82443BX host to PCI bridge> rev 0x03 on pci0.0.0
chip1: <Intel 82443BX host to AGP bridge> rev 0x03 on pci0.1.0
chip2: <Intel 82371AB PCI to ISA bridge> rev 0x02 on pci0.4.0
chip3: <Intel 82371AB Power management controller> rev 0x02 on pci0.4.3
ahc0: <Adaptec aic7890/91 Ultra2 SCSI adapter> rev 0x00 int a irq 19 on pci=
0.6.0
ahc0: aic7890/91 Wide Channel A, SCSI Id=3D7, 16/255 SCBs
hfa0: <FORE Systems PCA-200E ATM> rev 0x00 int a irq 19 on pci0.9.0
chip4: <PCI to PCI bridge (vendor=3D1011 device=3D0024)> rev 0x03 on pci0.1=
0.0
ahc1: <Adaptec 2940 Ultra2 SCSI adapter> rev 0x00 int a irq 17 on pci0.11.0
ahc1: aic7890/91 Wide Channel A, SCSI Id=3D7, 16/255 SCBs
ahc2: <Adaptec 2940 Ultra SCSI adapter> rev 0x00 int a irq 16 on pci0.12.0
ahc2: aic7880 Wide Channel A, SCSI Id=3D7, 16/255 SCBs
Probing for devices on PCI bus 1:
Probing for devices on PCI bus 2:
de0: <Digital 21140A Fast Ethernet> rev 0x22 int a irq 18 on pci2.4.0
de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.2
de0: address 00:e0:29:3c:fb:84
de1: <Digital 21140A Fast Ethernet> rev 0x22 int a irq 19 on pci2.5.0
de1: SMC 9332BDT 21140A [10-100Mb/s] pass 2.2
de1: address 00:e0:29:3c:fb:85
Probing for PnP devices:
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=3D0x0>
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
psm0 not found
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A, console
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
sio2: configured irq 5 not in bitmap of probed irqs 0
sio2 not found at 0x3e8
sio3: configured irq 9 not in bitmap of probed irqs 0
sio3 not found at 0x2e8
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
we0 at 0x2e8 on isa
we0: kernel is keeping watchdog alive
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via pin 2
IP packet filtering initialized, divert disabled, rule-based forwarding dis=
abled, logging limited to 100 packets/entry by default
ccd0-15: Concatenated disk drivers
Waiting 2 seconds for SCSI devices to settle
SMP: AP CPU #1 Launched!
de0: enabling 100baseTX port
chda1 at ahc1 bus 0 target 0 lun 0
da1: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da1: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da4 at ahc1 bus 0 target 3 lun 0
da4: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da4: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da4: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da7 at ahc1 bus 0 target 6 lun 0
da7: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da7: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da7: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da10 at ahc2 bus 0 target 0 lun 0
da10: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da10: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da10: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da6 at ahc1 bus 0 target 5 lun 0
da6: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da6: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da6: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da13 at ahc2 bus 0 target 3 lun 0
da13: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da13: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da13: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da5 at ahc1 bus 0 target 4 lun 0
da5: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da5: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da5: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da16 at ahc2 bus 0 target 6 lun 0
da16: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da16: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da16: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da3 at ahc1 bus 0 target 2 lun 0
da3: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da3: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da3: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da2 at ahc1 bus 0 target 1 lun 0
da2: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da2: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da2: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da15 at ahc2 bus 0 target 5 lun 0
da15: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da15: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da15: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da9 at ahc1 bus 0 target 9 lun 0
da9: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da9: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da9: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da14 at ahc2 bus 0 target 4 lun 0
da14: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da14: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da14: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da8 at ahc1 bus 0 target 8 lun 0
da8: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da8: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da8: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da12 at ahc2 bus 0 target 2 lun 0
da12: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da12: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da12: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da11 at ahc2 bus 0 target 1 lun 0
da11: <SEAGATE ST118273W 6244> Fixed Direct Access SCSI-2 device=20
da11: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing En=
abled
da11: 17366MB (35566480 512 byte sectors: 64H 32S/T 17366C)
da0 at ahc0 bus 0 target 0 lun 0
da0: <IBM DDRS-34560W S97B> Fixed Direct Access SCSI-2 device=20
da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing En=
abled
da0: 4357MB (8925000 512 byte sectors: 255H 63S/T 555C)
anging root device to da0s2a
Enter full pathname of shell or RETURN for /bin/sh:=20
erase ^H, kill ^U, intr ^C
# mioy=08 =08=08 =08=08 =08ount -a
# cd ~de1: autosense failed: cable problem?
jgreco
# ls
.cshrc		.login.env	.logout		.path		run
.login		.login.env.old	.mailrc		.profile
# sh run&
# dd: /dev/rda17: Device not configured
dd: /dev/rda18: Device not configured
(da13:ahc2:0:3:0): SCB 0xa - timed out in datain phase, SEQADDR =3D=3D 0x110
(da13:ahc2:0:3:0): Other SCB Timeout
(da11:ahc2:0:1:0): SCB 0xb - timed out in datain phase, SEQADDR =3D=3D 0x110
(da11:ahc2:0:1:0): Other SCB Timeout
(da10:ahc2:0:0:0): SCB 0x9 - timed out in datain phase, SEQADDR =3D=3D 0x110
(da10:ahc2:0:0:0): BDR message in message buffer
(da10:ahc2:0:0:0): SCB 0x9 - timed out in datain phase, SEQADDR =3D=3D 0x10f
(da10:ahc2:0:0:0): no longer in timeout, status =3D 34b
ahc2: Issued Channel A Bus Reset. 7 SCBs aborted
(da11:ahc2:0:1:0): SCB 0xa - timed out in datain phase, SEQADDR =3D=3D 0x153
(da11:ahc2:0:1:0): Other SCB Timeout
(da10:ahc2:0:0:0): SCB 0x9 - timed out in datain phase, SEQADDR =3D=3D 0x153
(da10:ahc2:0:0:0): BDR message in message buffer
(da10:ahc2:0:0:0): SCB 0x9 - timed out in datain phase, SEQADDR =3D=3D 0x153
(da10:ahc2:0:0:0): no longer in timeout, status =3D 34b
ahc2: Issued Channel A Bus Reset. 3 SCBs aborted
(da10:ahc2:0:0:0): SCB 0xa - timed out in datain phase, SEQADDR =3D=3D 0x110
(da10:ahc2:0:0:0): BDR message in message buffer
(da10:ahc2:0:0:0): SCB 0xa - timed out in datain phase, SEQADDR =3D=3D 0x10f
(da10:ahc2:0:0:0): no longer in timeout, status =3D 34b
ahc2: Issued Channel A Bus Reset. 6 SCBs aborted
4357+1 records in
4357+1 records out
4569600000 bytes transferred in 428.640450 secs (10660683 bytes/sec)

# reboot
/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=08|=08/=08-=08\=
=08|=08/=08-=08Console: serial port
BIOS drive A: is disk0
BIOS drive C: is disk1
BIOS drive D: is disk2

run is a little script that sucks data in from all SCSI drives with dd and
dumps it to /dev/null, in parallel.

Now, when the bus reset happens, often the drive listed will actually
recover and continue going, but if so, the others will typically stop (but
dd is just waiting for data).  This isn't written in stone, I've seen all
drives drop off, and I've also seen the whole thing recover just fine.
I have no idea what the result was for the incident listed above.  It was
one of dozens of incidents.

The "reboot" bit is also mildly interesting.  FreeBSD (cam?) seems to have
lots of problems halting or rebooting in the event that a device is
unavailable or a scbus is hung.  I'd guess that it is waiting to flush some
buffers or something, except that my tests only do reads - no writes.

... Joe

---------------------------------------------------------------------------=
----
Joe Greco - Systems Administrator			      jgreco@ns.sol.net
Solaria Public Access UNIX - Milwaukee, WI			   414/342-4847


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911281633.KAA55332>