Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Nov 2007 15:37:41 -0600
From:      Derek Ragona <derek@computinginnovations.com>
To:        "Tamouh H." <hakmi@rogers.com>, "'Barnaby Scott'" <bds@waywood.co.uk>
Cc:        freebsd-questions@freebsd.org
Subject:   RE: Dell PE4600 RAID5 server failing
Message-ID:  <6.0.0.22.2.20071114153312.024eda70@mail.computinginnovations.com>
In-Reply-To: <039801c826e9$f1804550$6700a8c0@tamouh>
References:  <473B0D70.7020307@waywood.co.uk> <6.0.0.22.2.20071114094712.024bfe38@mail.computinginnovations.com> <473B34C5.4030300@waywood.co.uk> <039801c826e9$f1804550$6700a8c0@tamouh>

next in thread | previous in thread | raw e-mail | index | archive | help
At 12:12 PM 11/14/2007, Tamouh H. wrote:
> >
> > Derek Ragona wrote:
> > > At 09:00 AM 11/14/2007, Barnaby Scott wrote:
> > >> I suspect I already know the answer to this, which is that the
> > >> trouble I am having is nothing to do with the OS at all,
> > but I have
> > >> to ask, because I am otherwise up against a total brick wall!
> > >>
> > >> I bought a second-hand Dell Poweredge 4600 and installed
> > FreeBSD 6.2
> > >> earlier this year. I had it set up with RAID5 using its PERC3/DC
> > >> controller, with 7 x 73GB disks (+ 1 hot spare). So far so
> > good, and
> > >> it worked faultlessly as a Samba server for several months.
> > >>
> > >> At the beginning of October, it went down, reporting a mismatch
> > >> between the configuration on the NVRAM and the disks. With
> > help from
> > >> Dell support, I managed to recreate the RAID array and it worked
> > >> again for a month.
> > >>
> > >> In early November it happened again, and has kept
> > happening since. At
> > >> one point it appeared that the backplane was faulty, so I replaced
> > >> that, but I cannot keep the server up for more than a day or so
> > >> without this 'mismatch' poblem.
> > >>
> > >> What about diagnostics on the hardware you may ask? I have run all
> > >> the diagnostic tools that Dell can supply - several times
> > - and the
> > >> server declares itself to be totally fault-free.
> > >>
> > >> My specific questions therefore:
> > >>
> > >> Is there any way at all that FreeBSD could be invloved with this
> > >> problem? (I did notice for example that the Dell PERC3/DC
> > controller
> > >> was not in the list of supported hardware - but then
> > again, why did
> > >> it work for several months?)
> > >>
> > >> Can I use FreeBSD to tell me anything about the fault that Dell's
> > >> diagnostic tools haven't found?
> > >>
> > >> (I do hope someone might be able to help - Dell are trying
> > to get me
> > >> to switch to a 'supported' OS!)
> > >>
> > >>
> > >> Thanks
> > >>
> > >> Barnaby Scott
> > >
> > > It doesn't sound like any OS issue as you set up the RAID
> > outside the
> > > OS.  It may be a bad drive or drive(s).  Most RAID drives have RAID
> > > information written to the drives, and if this becomes
> > unreadable you
> > > will have RAID faults.
> > >
> > > Another likely culprit is heat.  Overheating drives often
> > fail.  Are
> > > you sure the temperatures in the drive enclosure is OK?
> > >
> > > If you can, run diagnostics on the drives, this usually requires
> > > running these with the drives taken out of the RAID array though.
> > >
> > >         -Derek
> > >
> >
> > Thanks for replying - as I said, this is a long shot trying
> > to see if there is any OS involvement.
> >
> > The drives are fine - I have used two different tools to
> > analyse them while the computer is booted from a live CD and
> > the RAID configuration cleared on the controller. Besides,
> > you would expect one drive to fail at a time, and if this
> > happened, the hot spare would surely be pressed into service.
> > Nothing like this has happened though - the controller is
> > reporting several drives (not always the same ones) failed
> > simultaneously, but when the array is re-created from the
> > disks, everything works fine. Problem is, it goes down again
> > a day or so later.
> >
> > As for heat, there is nothing being reported there and the
> > fans that cool that area are working.
> >
> > Any other ideas gratefully received!
> >
> > Barnaby Scott
>
>This is very unlikely to be OS related. But here are few pointers:
>
>1) Check the make/model of the drives. Certain types of make/model SCSI 
>drives had a glitch in them a while ago with a certain firmware that 
>they'd disconnect from a RAID. I had a personal experience with these ones 
>(Seagate U320).
>
>2) What did happen in October? Anything hardware, software, power wise has 
>occurred ?
>
>3) NVRAM and Disk mismatch, I'd say check the controller, backup battery 
>present but weak ?
>
>4) Unlikely to be the source, but run a test on your physical RAM using 
>MEMTEST86+ and check the power supply is sufficient and working properly.
>
>

I've had some raid drives disconnect and go missing, which all cleared and 
was rebuilt on a full power-off reboot.  I belive this is due to some power 
issues in my area.  Specifically my line power from the utility was running 
high, over 127 volts, making over-voltage spikes prevalent.  On a couple 
spikes I saw the drives disconnect.

So it could be power related.

On temperature, I would put in a temperature probe and check it from the 
external probe.  Some remote KVM solutions now include temperature probes.

         -Derek

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6.0.0.22.2.20071114153312.024eda70>