Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Jul 2004 16:37:26 -0400
From:      Allan Fields <bsd@afields.ca>
To:        Andrew Atrens <atrens@nortelnetworks.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: help - catastrophic RAID failure ...
Message-ID:  <20040714203726.GA4773@afields.ca>
In-Reply-To: <200407141044.11474.atrens@nortelnetworks.com>
References:  <200407141044.11474.atrens@nortelnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 14, 2004 at 10:44:06AM -0400, Andrew Atrens wrote:
> 
> Also, can anyone give any general advice on reconstructing partition
> tables. I used the 4.7 installer to partition the disks the first time.
> And I'm pretty sure that I remember the partition sizes. Well, all 
> except the swap partition (but I think that that was 3G). I realise that
> recovering this might be an iterative process but am hopeful that if
> I get the 'a' partition back the rest may <slowly> fall back into place.
> This time around I guess I'll be using the 4.9 installer - does anyone
> know if there's any substantive differences in how it sets up it's 
> partitions/filesystems vs the 4.7 installer ?
>
> As always, any advice you folks could provide would be greatly appreciated.

I would suggest not going the sysinstall route for recovering your partitions
and disklabels.  It just seems too risky, instead first take a dump of
the raw array and then try to work on rebuilding it manually: You
can use fdisk and disklabel directly upon finding the old offsets
and sizes if you need to rebuild the partition table/disklabel.


> Hi I've got a RocketRAID 404, and I'm running FreeBSD 4.9.  When I upgraded
> drivers from 1.2 to 1.22 my machine locked up on boot. On the next reboot
> the RAID array ( I had a 4 disk 160G 1/0 striped-mirrored RAID ) was 
> reported as being severely damaged, I was prompted to 'check cables' and was 
> presented with 3 options - Destroy, Reboot, or Continue.

The problem w/ those types of opaque BIOS menus is you can't always
be entirely sure what actions will be taken.  My Promise Fastrak BIOS
will even take the liberty of automatically creating a new single-drive
array ontop of an existing one at boot time.  It's a DOS/Windows world.

> My controller BIOS was at 2.11. Being hopeful, I upgraded my BIOS to 
> 2.13c and rebooted again. Same message. Next I downgraded my BIOS back to 2.11.
> 
> The BIOS showed the drives as being something like -
> 
> 1.  Primary: Maxtor 80G ATA/133, BOOT  (Free)
>     Secondary: not present
> 2.  Maxtor 120G ATA/133  (120G Striped array)
>     Secondary: not present
> 3.  Primary: Maxtor 80G ATA/133 (Free)
>     Secondary: not present
> 4.  Maxtor 80G ATA/133  (80G Striped array)
>     Secondary: not present
> 
> At this point I thought that the best thing to do was to delete and 
> recreate the array using the same settings as I had used to create it
> initially. My thought was that the BIOS, being deterministic, would
> create the array in the same way that it had the first time, considering
> I was still using the same BIOS version - 2.11 that I had used before.
> 
> This worked, but when I rebooted my partition tables were empty. It's my
> theory that whatever destroyed the RAID setup also destroyed my partition 
> tables, and that there is a good chance that a lot of my data is still there.
>
> So how I'm going to proceed is to try to rebuild my partition tables, 
> and hopefully the filesystems will still be there.

At this point you could comb through your disk using dd and hexdump to
see what is actually on there.  Look at the primary partition table and
MBR.  Maybe all you need to do is restore that.

> What I need from you folks is validation that my understanding of the 
> situation is correct, and that I'm proceeding in the correct way.

If you want to be extra sure, locate some spare drives and dump it all
out so that you won't risk damaging it further upon recovery.  A
few years back, I made the mistake of try to fix a filesystem to
recover its contents on a defective Fuji MPG drive which was
semi-operational and it totally screwed messed it up, I could have
probably got away with incrementally reading data off the disk to
another drive: Dump first, then fix it.

> And, of course, any advice you could give about 'what went wrong' would 
> also be MOST appreciated. The RAID array has been so reliable that
> I hadn't bothered making any backups in a long, long time - AND I HAD
> SOME CRITICAL, IRREPLACEABLE DATA on there. :( :( :(

More obvious suggestions:

Because of the remote possibility of file system corruption in
software (or in this case hardware), RAID cannot reliably replace
back-ups.  One solution is to take periodic snapshots onto a secondary
dump/image drive.  You can buy a removable drive tray to facilitate
easy swapping.

It's also useful to store a copy of fdisk output and disklabels on
another machine or during backups.  An idea is to do an ls -ilR
catalog periodically, and rsync it + critical files off-site.

-- 
 Allan Fields, AFRSL - http://afields.ca
 2D4F 6806 D307 0889 6125  C31D F745 0D72 39B4 5541



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040714203726.GA4773>