Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 1998 11:55:39 -0600
From:      "Justin T. Gibbs" <gibbs@plutotech.com>
To:        Patrick Hartling <mystify@friley63.res.iastate.edu>
Cc:        scsi@FreeBSD.ORG
Subject:   RE: CAM == CAM Ate my Machine (and severly corrupted file systems too)
Message-ID:  <199804241759.LAA02289@pluto.plutotech.com>

next in thread | raw e-mail | index | archive | help
>The intention of this message is to warn people of the possibilty of serious
>disk corruption when using CAM + SMP + ccd.  Other factors could be involved,
>but I've never had anything like this happen to me before in 2 years of
>running FreeBSD.

It is likely CAM + BT-958...

>This morning when I got back from class, I discovered that my machine had
>apparently gotten hungry and had eaten itself.  It had been very stable for
>10 days running an SMP kernel with the CAM patches (built April 13, 1998),
>but then this happened.  Unfortunately, I don't know what caused this, but
>it certainly caused me a lot of stress this morning.

Was it wedged or did it panic or was it running normally and when you
attempted some operation failed?

>My current disk configuration is three UW SCSI disks (two Quantum Viking's
>and one WD Enterprise) with one Viking and the Enterprise on a BusLogic
>BT-958 and the other Viking on the onboard Adaptec 2940UW.  I have a
>mirrored ccd across the two Viking disks.  Besides that, I'd say that
>everything concerning partitions/slices is fairly typical.  (I also have a
>Jaz disk and a CD-ROM drive plugged into the BusLogic controller.)
>
>At any rate, my /var was completely trashed.

Which disk and controller contains /var.  Is it part of your CCD array?

>fsck core dumped on it repeatedly. /usr was pretty well hosed too.
>Lots of files (mostly shared libraries) were removed by fsck.  This was
>easy to replace since my /usr/src and /usr/obj partitions were fully intact.
>'make install' saved the day here--once I got ld.so and libc.so.3.1 restored.

Please be more specific about where these file systems were located.  I 
need to know if the problem resides in the BT driver or somewhere else.

>However, the real horror story was the complete loss of my home directory.
>BUT I have /home on the mirrored ccd, and the second partition in the ccd was
>fully intact by some miracle.  :)

It was probably on the Adaptec controller - the most well tested of the
controller drivers for CAM.

>The first partition was thoroughly
>trashed.  Everything that was in my base directory ended up in lost+found, so
>I could have gotten it back if I had spent the time to go through each file
>and directory and rename everything.  Once I found that the second partition
>was fine, I tried to do:
>
>	dd if=/dev/rda2s1e of=/dev/rda1s1e bs=64k
>
>but it kept saying that rda1s1e was a read-only filesystem.

My guess is that this error is coming from dsopen(), but I don't know why.
I can't see how this could be a CAM problem.

>Since getting everything more or less back to normal, I have crashed my
>machine again today by accidentally doing:
>
>	disklabel -r sd4c

This should not be able to crash your system.  Disklabel should simply open
up the device by that name in /dev and, should it exist, it will take
it directly to the da driver.  My guess is that there was still some latent
corruption in '/' that caused a panic.  When you are recovering your 
system or leaving it unattended, please leave the console switched to
VTY0 so that console messages can be captured should an error occur.  
Unless you have a serial console, you will never be able to get to the
useful information for fixing problems like this if you are in X.

>I'm still not fully used to the da stuff, but now that I have discovered
>mixing it up can be fatal to stability, I'll remember to be more careful.  :)

It shouldn't make a difference.  I've booted several times on systems where
all /dev entries were called "sdblah".

>So, unless someone can tell me what mistakes I've made to cause all this, I
>would recommend that people be extra careful with using the current CAM code
>(even though I'm really impressed with it overall).

A few words about your BT-958. Ensure that you are running good firmware on
your card.  Leonard Zubkoff has a great page that talks about BT firmware
issues with links to known good firmware:

	http://www.dandelion.com/Linux/BusLogic.html

You are also the first person to report using the BT-958 with this driver.
There are bound to be "some" problems with it as the driver was written
from the ground up and was only tested by my on an older BT-948.  Can you
send me the dmesg output from your system?  Was there any noticeable change
performance wise in the system after switching to CAM?

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804241759.LAA02289>