Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 02 Dec 2000 23:49:24 +0000
From:      Peter Gradwell <peter@gradwell.com>
To:        Mike Smith <msmith@freebsd.org>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: Mylex DAC960 Driver "online/offline" 
Message-ID:  <5.0.0.25.0.20001202233356.0366b2d8@pop3.gradwell.net>
In-Reply-To: <200012022339.eB2NdWF21371@mass.osd.bsdi.com>
References:  <Your message of "Fri, 01 Dec 2000 21:32:54 GMT." <5.0.0.25.0.20001201212649.03798548@pop3.gradwell.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Mike,

At 15:39 02/12/2000 -0800, Mike Smith wrote:
> > What does this message really mean?
>
>It means that the controller is telling us that the drive is offline.
>Then that it's online.  Then that it's offline again.
>
>You don't say what the time intervals between these messages are; you can
>get the 'drive offline' message from either the status poll (once per
>second) or if an I/O operation is sent to a drive that the controller
>reports as offline.  The 'drive online' message only comes from the
>status poll though.

It was occuring without any apparent activity, about once per second,
so I would guess it was from the status poll.

>Can you describe your configuration?  I can try to reproduce the
>situation here and see if it's not possible that there's a bug in the
>driver confusing the status between your two drives.  I have to say,
>though, that the fact that the controller thinks that one of your system
>drives is offline when you claim it's a mirror is a bit troubling.

Ok, on an update to the situation though, I was able to get too the
mylex bios (there is 250 miles between me and the machine you see!)
via a serial console and discovered that it had marked two drives offline.

We have:
         3 x 18 gig disks, of which two are bonded in a raid 1 pack
         and one is a hot spare
         2 x 36 gig disks, bonded in a raid 0 pack.

Everything apart from /var/spool/news is on the raid 1 pack. (Yeah, it's
a news server.)

One of the 18 gig disks and one of the 36 gig disks were marked offline.

I belive that when the 18 gig disk was marked off line the RAID card
rebuilt it's redundancy data onto the hot spare disk and carried on.
- cos the 18 gig which is off line was part of the raid 1 pack and there
is now not hot spare. *So, that's good.*

So, we hard reset the machine and it booted. However, the symptoms
described previously prevailed. We couldn't login via ssh or on the console
as it was unresponsive.

* This worries me. I would hope the machine would take the loss of
/v/s/news gracefully, and carry on.

So, when I accessed the bios this morning, I tried, as an "experiment"
to put the 36 gig disk back online and rebooted. After running fsck
a bit (is there a journaling file system for freebsd?!) the machine is
now running ok.

I have yet to schedule a reboot to mark the currently off line 18 gig
disk as the hot spare. I think I will be able to do this.

I am worried that the controller randomly marks the drives off line. Mylex
tell me this happens when it looses contact with the drives.
They are internal drives, well screwed into a big case, nicely racked
into a locked cabinet in Telehouse Europe. From what I can gather, no
one accessed the rack. It appears they aren't disconnected anyway
because I can mark them online and we're go again.

I'd be happy to help with more information if it helps. Directed questions
work best!

thanks

peter

--
peter gradwell; online @ http://www.gradwell.com/peter/



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5.0.0.25.0.20001202233356.0366b2d8>