From owner-freebsd-stable@FreeBSD.ORG  Tue Mar 30 06:07:59 2004
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id C582D16A4CE; Tue, 30 Mar 2004 06:07:59 -0800 (PST)
Received: from venus.int.gov.br (nat.int.gov.br [200.20.196.226])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id CAFE143D1D; Tue, 30 Mar 2004 06:07:58 -0800 (PST)
	(envelope-from jonny@jonny.eng.br)
Received: from [10.0.8.17] (helo=jonny.eng.br)
	by venus.int.gov.br with esmtp (TLSv1:AES256-SHA:256)
	(Exim 4.22)
	id 1B8JuG-0003yY-Kh; Tue, 30 Mar 2004 11:07:56 -0300
Message-ID: <40697F3B.2020202@jonny.eng.br>
Date: Tue, 30 Mar 2004 11:07:55 -0300
From: =?ISO-8859-1?Q?Jo=E3o_Carlos_Mendes_Lu=EDs?= <jonny@jonny.eng.br>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.6) Gecko/20040113
X-Accept-Language: pt-br, en-us, en, pt
MIME-Version: 1.0
To: Greg 'groggy' Lehey <grog@FreeBSD.org>
References: <4068EA56.3060600@jonny.eng.br>
	<20040330053143.GN15929@wantadilla.lemis.com>
In-Reply-To: <20040330053143.GN15929@wantadilla.lemis.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Scan-Signature: 38f37e4d3334169adebc569c30734514
cc: stable@freebsd.org
cc: hackers@freebsd.org
Subject: Re: Serious bug in vinum?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Mar 2004 14:08:00 -0000

Greg 'groggy' Lehey wrote:
> On Tuesday, 30 March 2004 at  0:32:38 -0300, Joo Carlos Mendes Lus wrote:
>>   Sorry for the cross-posting, but nor the Author nor freebsd-bugs did
>>acknowledge my message, and I think this is a very serious bug in vinum,
>>leading to loss of data...
>>
>>    If these are not the correct foruns for this, please forgive me and
>>    tell me which is the correct one.
>>
>>PS: Please CC: me, since I'm not currently subscribed to these
>>lists.
> 
> 
> Sorry for the lack of response.  Yes, I saw it, and so did Lukas Ertl,
> and we've been discussing it.  This list is probably not the best.
> 
> 
>>====================================================
>>Hi Greg,
>>
>>    I've been a big fan of vinum since it's beggining.  I use it for RAID0
>>and RAID1 solution for lots of servers.
>>
>>    In some RAID0 (stripe) configurations, though, I've had some serious
>>problems.  If an underlying disk fails, the respective plex and volume do
>>not fail, as they should.  This leads to full corruption of data, but worst
>>of that, leads to a system which believes the data is safe.  In one ocasion,
>>for example, the backup ran and overwrote good data with bad data, full of
>>zeros.
>>
>>    I am not fully aware of vinum programming details, but a quick look at
>>4.9-STABLE, in file vinumstate.c, dated Jul, 7, 2000, at line 588, function
>>update_volume_state() sets volume state to up if plex state is corrupt or
>>better for at least one plex:
>>
>>    for (plexno = 0; plexno < vol->plexes; plexno++) {
>>        struct plex *plex = &PLEX[vol->plex[plexno]];       /* point to the plex */
>>        if (plex->state >= plex_corrupt) {                  /* something accessible, */
>>            vol->state = volume_up;
>>            break;
>>        }
>>    }
>>
>>    I think this should be like:
>>
>>        if (plex->state > plex_corrupt) {                  /* something accessible, */
> 
> 
> Basically, this is a feature and not a bug.  A plex that is corrupt is
> still partially accessible, so we should allow access to it.  If you
> have two striped plexes both striped between two disks, with the same
> stripe size, and one plex starts on the first drive, and the other on
> the second, and one drive dies, then each plex will lose half of its
> data, every second stripe.  But the volume will be completely
> accessible.

     A good idea if you have both stripe and mirror, to avoid discarding the 
whole disk.  But, IMHO, if some part of the disk is inacessible, the volume 
should go down, and IFF the operator wants to try recovery, should use the 
setstate command.  This is the safe state.

     Note that when a hardware error occurs, it may be temporary.  I have 
already seen lots of sector reading or writing errors that happen only once in 
the whole server life.  But in vinum, a single read or write error is enough to 
change the subdisk state, and it will never come back without operator intervention.

     If it's not easy to map if an equivalent subdisk is available, could at 
least be an option to choose which behaviour is expected?

> I think that the real issue here is that Vinum should have returned an
> I/O error for accesses to the defective parts.  How did you perform
> the backup?

     Simply "rsync -aHSx --delete ..." from another machine every 6 hours.  Note 
that I do not use the "--ignore-errors" option.  I expected that if the master 
machine crash, the slave would have a full snapshot with at most 6 hour lost work.

     When this problem happened, there was a backup run just before I could 
notice, and most files have been overwritten with zeroes or deleted.  I also 
have tgz snapshots, but Murphy rules that this kind of problem will happen at 
the latest possible date before the snapshot.


     By the way: This discussion remembers me that some vinum operations panic 
the server if done in multiuser mode.  Is this known to you?  I could not fully 
reproduce it yet, but if the server is important, I do not dare executing vinum 
commands in multiuser mode anymore.  This means that I must be in console, and 
not over the network to configure or change some vinum configuration, reducing 
my telecommuting options.  ;-)