From owner-freebsd-current@FreeBSD.ORG  Sun Nov  7 10:41:04 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 05F8A16A4CE; Sun,  7 Nov 2004 10:41:04 +0000 (GMT)
Received: from imap.univie.ac.at (mail.univie.ac.at [131.130.1.27])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id E911C43D2D; Sun,  7 Nov 2004 10:41:02 +0000 (GMT)
	(envelope-from le@FreeBSD.org)
Received: from korben.prv.univie.ac.at (korben.prv.univie.ac.at
	[131.130.7.98])
	by imap.univie.ac.at (8.12.10/8.12.10) with ESMTP id iA7AevMx136128;
	Sun, 7 Nov 2004 11:40:58 +0100
Date: Sun, 7 Nov 2004 11:40:59 +0100 (CET)
From: Lukas Ertl <le@FreeBSD.org>
To: freebsd@newmillennium.net.au
In-Reply-To: <00a701c4c466$01acd9f0$0201000a@riker>
Message-ID: <20041107113342.K570@korben.prv.univie.ac.at>
References: <00a701c4c466$01acd9f0$0201000a@riker>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-DCC-ZID-Univie-Metrics: mx9.univie.ac.at 4247; Body=3 Fuz1=3 Fuz2=3
cc: 'Greg 'groggy' Lehey' <grog@FreeBSD.org>
cc: freebsd-current@FreeBSD.org
Subject: RE: Gvinum RAID5 performance
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Nov 2004 10:41:04 -0000

On Sun, 7 Nov 2004 freebsd@newmillennium.net.au wrote:

> In geom_vinum_plex.c, line 575
>
> /*
> * RAID5 sub-requests need to come in correct order, otherwise
> * we trip over the parity, as it might be overwritten by
> * another sub-request.
> */
> if (pbp->bio_driver1 != NULL &&
>    gv_stripe_active(p, pbp)) {
> 	/* Park the bio on the waiting queue. */
> 	pbp->bio_cflags |= GV_BIO_ONHOLD;
> 	bq = g_malloc(sizeof(*bq), M_WAITOK | M_ZERO);
> 	bq->bp = pbp;
> 	mtx_lock(&p->bqueue_mtx);
> 	TAILQ_INSERT_TAIL(&p->wqueue, bq, queue);
> 	mtx_unlock(&p->bqueue_mtx);
> }
>
> It seems we are holding back all requests to a currently active stripe,
> even if it is just a read and would never write anything back.

No, only writes are held back.  pbp->bio_driver1 is NULL when it's a 
normal read.

> 1. To calculate parity, we could simply read the old data (that was
> about to be overwritten), and the old parity, and recalculate the parity
> based on that information, rather than reading in all the stripes (based
> on the assumption that the original parity was correct). This would
> still take approximately the same amount of time, but would leave the
> other disks in the stripe available for other I/O.

That's how it's already done: old parity, old data is read.  New parity, 
new data is written.

> 2. If there are two or more writes pending for the same stripe (that is,
> up to the point that the data|parity has been written), they should be
> condensed into a single operation so that there is a single write to the
> parity, rather than one write for each operation. This way, we should be
> able to get close to (N -1) * disk throughput for large sequential
> writes, without compromising the integrity of the parity on disk.
>
> 3. When calculating parity as per (2), we should operate on whole blocks
> (as defined by the underlying device). This provides the benefit of
> being able to write a complete block to the subdisk, so the underlying
> mechanism does not have to do a read/update/write operation to write a
> partial block.

These are interesting ideas and I'm gonna think about it.

thanks,
le

-- 
Lukas Ertl                         http://homepage.univie.ac.at/l.ertl/
le@FreeBSD.org                     http://people.freebsd.org/~le/