From owner-freebsd-geom@FreeBSD.ORG Sat Nov 3 01:33:19 2007 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3E7E16A469 for ; Sat, 3 Nov 2007 01:33:19 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.228]) by mx1.freebsd.org (Postfix) with ESMTP id 957D713C48E for ; Sat, 3 Nov 2007 01:33:18 +0000 (UTC) (envelope-from freebsd-lists@ideo.com.br) Received: by nz-out-0506.google.com with SMTP id l8so727254nzf for ; Fri, 02 Nov 2007 18:32:55 -0700 (PDT) Received: by 10.142.14.20 with SMTP id 20mr721912wfn.1194053574296; Fri, 02 Nov 2007 18:32:54 -0700 (PDT) Received: by 10.142.135.15 with HTTP; Fri, 2 Nov 2007 18:32:54 -0700 (PDT) Message-ID: <8d4842b50711021832g7ad7cec9x48d2f114b1e41f5f@mail.gmail.com> Date: Fri, 2 Nov 2007 22:32:54 -0300 From: "Marco Haddad" To: "Peter Giessel" In-Reply-To: <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> MIME-Version: 1.0 References: <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no> <472AA59F.3020103@rootnode.com> <0001DFFC-0115-1000-9A80-3F81219C1B16-Webmail-10013@mac.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-geom@freebsd.org, Ulf Lilleengen , Joe Koberg Subject: Re: gvinum and raid5 X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Nov 2007 01:33:20 -0000 Hi, I must say that I had a strong faith on vinum too. I used it on a dozen servers to build raid5 volumes, specially when the hard drives were small and unreliable. So I had a few crashes naturally, but replacing the failed disk was easy and rebuild worked all times. I started using gvinum at the first SCSI controller not supported by vinum found. As gvinum solved the vinum problem with that controller, it immediately received the same faith I had on vinum. I kept using gvinum many times after, till my faith was shaken by a hard disk crash, because I could not get the replacement drive added to the raid5 volume. After a lot of head bumping against the wall, I came up with this work around procedure to replace a failed disk. I have just used this procedure today to replace a SATA hard disk that I suspect was the cause of an intermittent failure, with such a success that I began to think it isn't so bad... Anyway I'll describe a simple example in order to get your comments. Suppose a simple system with three hard disks ad0, ad1 and ad2. They were fdisked and labled equally. ad0s1a is the / and ad0s1d, ad1s1d and ad2s1d are of the same size and are used by gvinum as drives AD0, AD1 and AD2. Each drive has only one slice and they are joined in a raid5 plex formming the volume VOL. The gvinum create script would be the following: drive AD0 device /dev/ad0s1d drive AD1 device /dev/ad1s1d drive AD2 device /dev/ad2s1d volume VOL plex org raid5 128K sd drive AD0 sd drive AD1 sd drive AD2 Suppose ad1 crashes and gvinum marks it as down. With the command "gvinum l" we would get something like this: 3 drives: D AD0 State: up /dev/ad0s1d ... D AD1 State: down /dev/ad1s1d ... D AD2 State: up /dev/ad2s1d ... 1 volumes: V VOL State: up ... 1 plexes: P VOL.p0 R5 State: degraded ... 3 subdisks: S VOL.p0.s0 State: up D: AD0 ... S VOL.p0.s1 State: down D: AD1 ... S VOL.p0.s2 State: up D: AD2 ... First thing I do: edit fstab and comment out the line mounting /dev/gvinum/VOL wherever it was mounted. It is necessary because once mounted gvinum can not operate most commands, and umount doesn't do the trick. Then I shutdown the system and replace the hard disk and bring it up again. At this point the first weird thing can be noted. With 'gvinum l' you would get: 2 drives: D AD0 State: up /dev/ad0s1d ... D AD2 State: up /dev/ad2s1d ... 1 volumes: V VOL State: up ... 1 plexes: P VOL.p0 R5 State: up ... 3 subdisks: S VOL.p0.s0 State: up D: AD0 ... S VOL.p0.s1 State: up D: AD1 ... S VOL.p0.s2 State: up D: AD2 ... What? The AD1 is gonne, ok, but why the subdisk VOL.p0.s1 is up? And it makes the plex up instead of degraded. The first time I saw it I got the shivers. Next step is to fdisk and label the new disk just like the old one. The new disk can be bigger but, I think, the partition ad1s1d must be the same size as before. At this point should be enough to use gvinum create with a script file containing only the line: drive AD1 device /dev/ad1s1d but gvinum would panic with that and the system would lock or core dump. Then something weird must be done: remove all gvinum objects with 'gvinum rm ---'. Yes, just to make it clear, in this case the commands would be: gvinum rm -r AD0 gvinum rm -r AD2 gvinum rm VOL gvinum rm VOL.p0 gvinum rm VOL.p0.s1 Then we can use 'gvinum create' with the original script to recreate everything. Now it is all up again, but it isn't just right. The subdisk VOL.p0.s1 must be marked as stale with: gvinum setstate -f stale VOL.p0.s1 This brings back the plex to degraded mode and we can use: gvinum start VOL to rebuild it. It may take about 1 hour per 100GB of volume space, so we better grab some lunch... The progress can be seen at any time with: gvinum ls After that, a 'fsck -t ufs /dev/gvinum/VOL' will probably catch some errors left behind when the drive came down. Now we just need to uncomment that line in fstab and reboot. I think there's no easier way... Regards, Marco Haddad On 11/2/07, Peter Giessel wrote: > > On Friday, November 02, 2007, at 01:04AM, "Joe Koberg" < joe@rootnode.com> > wrote: > >Ulf Lilleengen wrote: > >> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote: > >> > >>> I found in recent researchs that a lot of people say gvinum should not > be > >>> trusted, when it comes to raid5. I began to get worried. Am I alone > using > >>> > >>> > >> I'm working on it, and there are definately people still using it. > (I've > >> recieved a number of private mails as well as those seen on this list). > IMO, > >> gvinum can be trusted when it comes to raid5. I've not experienced any > >> corruption-bugs or anything like that with it. > >> > > > >The source of the mistrust may be the fact that few software-only RAID-5 > >systems can guarantee write consistency across a multi-drive > >read-update-write cycle in the case of, e.g., power failure. > > That may be the true source, but my source of mistrust comes from a few > drive failures and gvinum's inability to rebuild the replaced drive. > > Worked fine under vinum in tests, tried the same thing in gvinum (granted, > this was under FreeBSD 5), and the array failed to rebuild. > > I can't be 100% sure it wasn't a flakey ATA controller and not gvinum's > fault, and I no longer have access to the box to play with, but when I was > playing with gvinum, replacing a failed drive usually resulted in panics. >