Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 1998 14:11:00 -0500
From:      Patrick Hartling <mystify@friley63.res.iastate.edu>
To:        "Justin T. Gibbs" <gibbs@plutotech.com>
Cc:        scsi@FreeBSD.ORG
Subject:   Re: CAM == CAM Ate my Machine (and severly corrupted file systems too) 
Message-ID:  <199804241911.OAA04442@friley63.res.iastate.edu>
In-Reply-To: Message from "Justin T. Gibbs" <gibbs@plutotech.com>  of "Fri, 24 Apr 1998 11:55:39 MDT." <199804241759.LAA02289@pluto.plutotech.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
"Justin T. Gibbs" <gibbs@plutotech.com> wrote:

} >The intention of this message is to warn people of the possibilty of serious
} >disk corruption when using CAM + SMP + ccd.
} 
} It is likely CAM + BT-958...

Yes, I'm now quite certain you are correct.  I should have been a little more
clear in my previous message about which partitions are on which disks and
which controllers.  Every partition that ended up being corrupted (i.e., /var,
/usr and /home) are on the Viking disk that is part of the BT-958 bus.

} >This morning when I got back from class, I discovered that my machine had
} >apparently gotten hungry and had eaten itself.  It had been very stable for
} >10 days running an SMP kernel with the CAM patches (built April 13, 1998),
} >but then this happened.  Unfortunately, I don't know what caused this, but
} >it certainly caused me a lot of stress this morning.
} 
} Was it wedged or did it panic or was it running normally and when you
} attempted some operation failed?

When I got back, it was waiting for me to provide a path to root's shell.
fsck could not find the super block for the /var partition.  I assume what
happened was that the machine panic'd and rebooted while I was gone.  I
don't know how long it had been in that state, but my roommate informed me
that he had heard the disks grinding just a few minutes before I got back.

} Which disk and controller contains /var.  Is it part of your CCD array?

It is not part of the CCD array.

} >However, the real horror story was the complete loss of my home directory.
} >BUT I have /home on the mirrored ccd, and the second partition in the ccd wa
*** s
} >fully intact by some miracle.  :)
} 
} It was probably on the Adaptec controller - the most well tested of the
} controller drivers for CAM.

Thankfully!  :)  If it weren't for that, I'd be one unhappy camper right now.

} >Once I found that the second partition
} >was fine, I tried to do:
} >
} >	dd if=/dev/rda2s1e of=/dev/rda1s1e bs=64k
} >
} >but it kept saying that rda1s1e was a read-only filesystem.
} 
} My guess is that this error is coming from dsopen(), but I don't know why.
} I can't see how this could be a CAM problem.

I don't either.  I just noted it because it seemed weird.

} >Since getting everything more or less back to normal, I have crashed my
} >machine again today by accidentally doing:
} >
} >	disklabel -r sd4c
} 
} This should not be able to crash your system.  Disklabel should simply open
} up the device by that name in /dev and, should it exist, it will take
} it directly to the da driver.  My guess is that there was still some latent
} corruption in '/' that caused a panic.

That's possible.  My root partition is on the WD disk that's also part of
the BT-958 bus.  I hadn't considered that possibility, but I was certainly
taken aback when it did cause a crash.

} When you are recovering your 
} system or leaving it unattended, please leave the console switched to
} VTY0 so that console messages can be captured should an error occur.  
} Unless you have a serial console, you will never be able to get to the
} useful information for fixing problems like this if you are in X.

I will do that in the future.  I did the above command remotely just to
verify that I had screwed up a disklabel a while ago (which I had) and not
as root.

} A few words about your BT-958. Ensure that you are running good firmware on
} your card.  Leonard Zubkoff has a great page that talks about BT firmware
} issues with links to known good firmware:
} 
} 	http://www.dandelion.com/Linux/BusLogic.html

I will definitely look into this ASAP.  Thank you for the info.

} You are also the first person to report using the BT-958 with this driver.
} There are bound to be "some" problems with it as the driver was written
} from the ground up and was only tested by my on an older BT-948.  Can you
} send me the dmesg output from your system?  Was there any noticeable change
} performance wise in the system after switching to CAM?

I haven't had a chance to run any kind of benchmarks with CAM vs the old
SCSI system.  I'll have to do that as soon as I get time because I think it
would very interesting to see what kind of performance I'm getting now.
I'm happy to report that things do "feel" faster though--especially with Jaz
disks.

Here's the dmesg output:

d: 12, version: 0x00040011, at 0xfec08000
 io0 (APIC): apic id: 13, version: 0x00170011, at 0xfec00000
Probing for devices on PCI bus 0:
chip0: <Intel 82440FX (Natoma) PCI and memory controller> rev 0x02 on pci0.0.0
fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x01 int a irq 18 on pci0.6.0
fxp0: Ethernet address 00:a0:c9:14:0d:5f
chip1: <Intel 82371SB PCI to ISA bridge> rev 0x01 on pci0.7.0
chip2: <Intel 82371SB USB host controller> rev 0x01 int d irq 11 on pci0.7.2
ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 17 on pci0.9.0
ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
bt0: <Buslogic Multimaster SCSI host adapter> rev 0x08 int a irq 16 on pci0.11.0
bt0: BT-958 FW Rev. 5.05R Ultra Wide SCSI Host Adapter, SCSI ID 7, 192 CCBs
vga0: <Matrox MGA 2064W graphics accelerator> rev 0x01 int a irq 17 on pci0.15.0
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
pcm0 at 0x530 irq 5 drq 1 flags 0xa610 on isa
mss_attach <mss>0 at 0x530 irq 5 dma 1:0 flags 0xa610
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
psm0 at 0x60-0x64 irq 12 on motherboard
psm0: model MouseMan+, device ID 0
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
npx0 on motherboard
npx0: INT 16 interface
APIC_IO: routing 8254 via 8259 on pin 0
ccd0-3: Concatenated disk drivers
SMP: AP CPU #1 Launched!
bt0: bt_cmd: Timeout waiting for adapter ready, status = 0x0
bt0: btfetchtransinfo - Inquire Setup Info Failed
(probe19:bt0:0:4:0): MODE SENSE(06). CDB: 1a 0 a 0 14 0 
(probe19:bt0:0:4:0): ILLEGAL REQUEST asc:24,0
(probe19:bt0:0:4:0): Invalid field in CDB
da2 at ahc0 bus 0 target 0 lun 0
da2: <QUANTUM VIKING 4.5 WSE 880R> Fixed Direct Access SCSI2 device 
da2: Serial Number 174721630980
da2: 40.0MB/s transfers (20.0MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: 4345MB (8899737 512 byte sectors: 255H 63S/T 553C)
da1 at bt0 bus 0 target 1 lun 0
da1: <QUANTUM VIKING 4.5 WSE 880R> Fixed Direct Access SCSI2 device 
da1: Serial Number 174721632608
da1: 20.0MB/s transfers (20.0MHz, offset 15), Tagged Queueing Enabled
da1: 4345MB (8899737 512 byte sectors: 255H 63S/T 553C)
(da4:bt0:0:4:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
(da4:bt0:0:4:0): NOT READY asc:3a,0
(da4:bt0:0:4:0): Medium not present
da4 at bt0 bus 0 target 4 lun 0
da4: <iomega jaz 1GB J^77> Removable Direct Access SCSI2 device 
da4: 10.0MB/s transfers (10.0MHz, offset 15)
da4: Attempt to query device size, failed
cd0 at bt0 bus 0 target 3 lun 0
cd0: <TEAC CD-ROM CD-516S 1.0G> Removable CD-ROM SCSI2 device 
cd0: Serial Number \^_
cd0: 10.0MB/s transfers (10.0MHz, offset 8)
cd0: Attempt to query device size failed: NOT READY, Medium not present
da0 at bt0 bus 0 target 0 lun 0
da0: <WDIGTL ENTERPRISE 1.61> Fixed Direct Access SCSI2 device 
da0: Serial Number WS7000054039
da0: 3.300MB/s transfers , Tagged Queueing Enabled
da0: 2077MB (4254819 512 byte sectors: 255H 63S/T 264C)
ccd0: mirror/parity forces uniform flag

 -Patrick


Patrick L. Hartling			| Research Assistant, ICEMT
mystify@friley63.res.iastate.edu	| SE Lab - 1117 Black Engineering
http://www.public.iastate.edu/~oz	| http://www.icemt.iastate.edu

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804241911.OAA04442>