Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Dec 1998 10:07:13 -0600 (CST)
From:      Igor Roshchin <igor@physics.uiuc.edu>
To:        spork@super-g.com (spork)
Cc:        stable@FreeBSD.ORG
Subject:   Re: CAM and -stable
Message-ID:  <199812141607.KAA25672@alecto.physics.uiuc.edu>
In-Reply-To: <Pine.BSF.4.00.9812141026360.25881-100000@super-g.inch.com> from "spork" at "Dec 14, 1998 10:29: 8 am"

next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Charles,

This particular picture is very similar, and almost identical (the
messages in your first e-mail )
 to what I had on one machine which is running 2.1-stable
(after 2.1.7.1)
The hardware was : Adaptec 2940UW + Megadrive tower with a 4.x GB HDD
(I believe from Western Digital, or may be Seagate)
+ Quantum (850MB) internal SCSI drive.

We decided that the problem was caused by the bad coincidence of the 
Adaptec 2940+non-perfect driver in 2.1-stable for it + QUANTUM HDD.

In my case it was also giving such messages, sometimes hanging, sometimes
not - I think depending on whether there was another job waiting
for the disk or not.

When we had to remove the Megadrive tower, and install another
internal SCSI disk, then the problem started to be more severe:
At almost any intensive disk usage (like "weekly" script, where
we rotate and analyze our web and ftp logs)- it hangs,
now without writing anything in the syslog, but just on the console.

I doubt this helps, but ..
If you'd solve your problem, would you please, let me know what was the reason,
and how you solved it ?

Thanks,

Igor


> Hi,
> 
> FWIW, I got the same message today, but the machine didn't lock up.  Any
> ideas?  Anyone?  I'm cc-ing stable this time in hopes of finding someone
> running cam under -stable...
> 
> Here's the messages:
> 
> Dec 12 03:44:18 shell /kernel: (da1:ahc0:0:0:1): tagged openings now 31
> Dec 13 02:01:04 shell /kernel: (da1:ahc0:0:0:1): tagged openings now 2
> Dec 14 02:01:21 shell /kernel: (da0:ahc0:0:0:0): tagged openings now 30
> Dec 14 10:07:42 shell /kernel: (da0:ahc0:0:0:0): SCB 0x1c - timed out
> while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
> Dec 14 10:07:45 shell /kernel: SEQADDR == 0x8
> Dec 14 10:07:45 shell /kernel: SSTAT1 == 0xa
> Dec 14 10:07:45 shell /kernel: (da0:ahc0:0:0:0): Queuing a BDR SCB
> Dec 14 10:07:45 shell /kernel: (da0:ahc0:0:0:0): Bus Device Reset Message
> Sent
> Dec 14 10:07:45 shell /kernel: (da0:ahc0:0:0:0): no longer in timeout,
> status = 34b
> Dec 14 10:07:45 shell /kernel: ahc0: Bus Device Reset Sent. 1 SCBs aborted 
> 
> Charles
> 
> ---
> Charles Sprickman
> spork@super-g.com
> --- 
>                      "...there's no idea that's so good you can't 
>                       ruin it with a few well-placed idiots." 
> 
> On Fri, 11 Dec 1998, spork wrote:
> 
> > Hi,
> > 
> > I'm about to put two new machines in production, and they're both "core"
> > machines; main dns/auth/mail and a shell machine.  Currently the machines
> > we use in this capacity are 2.1.7.1, and it's been very stable.
> > 
> > Now the new machines share a RAID array hung off of a CMD CRD-5440.  I
> > patched our usual build (980825 -stable) with the July CAM patchkit, as
> > the existing AHC driver couldn't detect any LUNs beyond the first one.
> > 
> > All has been well so far, I've tried to stress the machines as much as
> > possible by running some disk benchmarks over and over, but yesterday one
> > locked up (console frozen) with the following messages being the last
> > thing on the console:
> > 
> > Dec 10 18:13:15 shell /kernel: (da0:ahc0:0:0:0): SCB 0x1e - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
> > Dec 10 18:13:18 shell /kernel: SEQADDR == 0xa
> > Dec 10 18:13:18 shell /kernel: SSTAT1 == 0xb
> > Dec 10 18:13:18 shell /kernel: (da0:ahc0:0:0:0): Queuing a BDR SCB
> > Dec 10 18:13:18 shell /kernel: (da0:ahc0:0:0:0): Bus Device Reset Message Sent
> > Dec 10 18:13:18 shell /kernel: (da0:ahc0:0:0:0): no longer in timeout, status = 34b
> > Dec 10 18:13:18 shell /kernel: ahc0: Bus Device Reset Sent. 2 SCBs aborted
> > 
> > I had to give it a hard reset at this point.
> > 
> > So my questions are:  Is this a known issue?  Does it point to a possible
> > hardware problem?  Will there be a newer cam patchkit for -stable?
> > 
> > I don't think it's a cabling issue, as this is the first I've seen of any
> > anomolies with the scsi subsystem, and the only cabling in question here
> > is a high quality 2' external UW scsi cable from the back of this machine
> > to the RAID array.  The other machine that uses the other host port on the
> > RAID array remained functional during this glitch...
> > 
> > Any ideas?  I was very comfortable with CAM before, but now I'm a little
> > nervous about moving this into production.  Would it be better to try and
> > back out of the patches and use the ahc driver?  Let me know if there's
> > any other info needed.
> > 
> > Following are the boot messages...
> > 
> > Thanks,
> > 
> > Charles
> > 
> > Dec 10 19:27:32 shell /kernel: Copyright (c) 1992-1998 FreeBSD Inc.
> > Dec 10 19:27:32 shell /kernel: Copyright (c) 1982, 1986, 1989, 1991, 1993
> > Dec 10 19:27:32 shell /kernel: The Regents of the University of California.  All rights reserved.
> > Dec 10 19:27:32 shell /kernel: 
> > Dec 10 19:27:32 shell /kernel: FreeBSD 2.2.7-19980825-SNAP #0: Thu Dec 10 12:02:45 EST 1998
> > Dec 10 19:27:32 shell /kernel: spork@shell.inch.com:/usr/src/sys/compile/SHELL
> > Dec 10 19:27:32 shell /kernel: CPU: Pentium II (quarter-micron) (350.80-MHz 686-class CPU)
> > Dec 10 19:27:32 shell /kernel: Origin = "GenuineIntel"  Id = 0x651  Stepping=1
> > Dec 10 19:27:32 shell /kernel: Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,<b16>,<b17>,MMX,<b24>>
> > Dec 10 19:27:32 shell /kernel: real memory  = 268435456 (262144K bytes)
> > Dec 10 19:27:32 shell /kernel: avail memory = 261144576 (255024K bytes)
> > Dec 10 19:27:32 shell /kernel: Probing for devices on PCI bus 0:
> > Dec 10 19:27:32 shell /kernel: chip0 <generic PCI bridge (vendor=8086 device=7190 subclass=0)> rev 2 on pci0:0:0
> > Dec 10 19:27:32 shell /kernel: chip1 <generic PCI bridge (vendor=8086 device=7191 subclass=4)> rev 2 on pci0:1:0
> > Dec 10 19:27:32 shell /kernel: chip2 <Intel 82371AB PCI-ISA bridge> rev 2 on pci0:4:0
> > Dec 10 19:27:32 shell /kernel: chip3 <Intel 82371AB IDE interface> rev 1 on pci0:4:1
> > Dec 10 19:27:32 shell /kernel: chip4 <Intel 82371AB USB interface> rev 1 int d irq 12 on pci0:4:2
> > Dec 10 19:27:32 shell /kernel: chip5 <Intel 82371AB Power management controller> rev 2 on pci0:4:3
> > Dec 10 19:27:32 shell /kernel: fxp0 <Intel EtherExpress P
> > Dec 10 19:27:32 shell /kernel: ro 10/100B Ethernet> rev 5 int a irq 10 on pci0:7:0
> > Dec 10 19:27:32 shell /kernel: fxp0: Ethernet address 00:e0:18:90:36:4d
> > Dec 10 19:27:32 shell /kernel: ahc0 <Adaptec 2940 Ultra SCSI adapter> rev 1 int a irq 12 on pci0:9:0
> > Dec 10 19:27:32 shell /kernel: ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
> > Dec 10 19:27:32 shell /kernel: fxp1 <Intel EtherExpress Pro 10/100B Ethernet> rev 5 int a irq 10 on pci0:10:0
> > Dec 10 19:27:32 shell /kernel: fxp1: Ethernet address 00:a0:c9:e7:ac:7d
> > Dec 10 19:27:32 shell /kernel: vga0 <VGA-compatible display device> rev 211 int a irq 11 on pci0:11:0
> > Dec 10 19:27:32 shell /kernel: Probing for devices on PCI bus 1:
> > Dec 10 19:27:32 shell /kernel: Probing for devices on the ISA bus:
> > Dec 10 19:27:32 shell /kernel: sc0 at 0x60-0x6f irq 1 on motherboard
> > Dec 10 19:27:32 shell /kernel: sc0: VGA color <16 virtual consoles, flags=0x0>
> > Dec 10 19:27:32 shell /kernel: sio0 at 0x3f8-0x3ff irq 4 on isa
> > Dec 10 19:27:32 shell /kernel: sio0: type 16550A
> > Dec 10 19:27:32 shell /kernel: sio1 at 0x2f8-0x2ff irq 3 on isa
> > Dec 10 19:27:32 shell /kernel: sio1: type 16550A
> > Dec 10 19:27:32 shell /kernel: lpt0 at 0x378-0x37f irq 7 on isa
> > Dec 10 19:27:32 shell /kernel: lpt0: Interrupt-driven port
> > Dec 10 19:27:32 shell /kernel: lp0: TCP/IP capable interface
> > Dec 10 19:27:32 shell /kernel: fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
> > Dec 10 19:27:32 shell /kernel: fdc0: FIFO enabled, 8 bytes threshold
> > Dec 10 19:27:32 shell /kernel: fd0: 1.44MB 3.5in
> > Dec 10 19:27:32 shell /kernel: npx0 flags 0x1 on motherboard
> > Dec 10 19:27:32 shell /kernel: npx0: INT 16 interface
> > Dec 10 19:27:32 shell /kernel: IP packet filtering initialized, divert enabled, logging limited to 200 packets/entry
> > Dec 10 19:27:32 shell /kernel: da0 at ahc0 bus 0 target 0 lun 0
> > Dec 10 19:27:32 shell /kernel: da0: <CMD TECH CRD-5440-1 C1-5> Fixed Direct Access SCSI2 device 
> > Dec 10 19:27:32 shell /kernel: da0: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> > Dec 10 19:27:32 shell /kernel: da0: 6999MB (14335872 512 byte sectors: 64H 32S/T 6999C)
> > Dec 10 19:27:32 shell /kernel: da1 at ahc0 bus 0 target 0 lun 1
> > Dec 10 19:27:32 shell /kernel: da1: <CMD TECH CRD-5440-1 C1-5> Fixed Direct Access SCSI2 device 
> > Dec 10 19:27:32 shell /kernel: da1: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> > Dec 10 19:27:32 shell /kernel: da1: 10431MB (21362688 512 byte sectors: 64H 32S/T 10431C)
> > Dec 10 19:27:32 shell /kernel: WARNING: / was not properly dismounted.
> > Dec 10 19:27:32 shell /kernel: nfs server 10.0.0.1:/var/mail: not responding
> > Dec 10 19:27:32 shell savecore: no core dump
> > 
> > ---
> > Charles Sprickman
> > spork@super-g.com
> > --- 
> >                      "...there's no idea that's so good you can't 
> >                       ruin it with a few well-placed idiots." 
> > 
> > 
> > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > with "unsubscribe freebsd-scsi" in the body of the message
> > 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-stable" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812141607.KAA25672>