From owner-freebsd-current@FreeBSD.ORG Sun Nov 7 10:41:04 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 05F8A16A4CE; Sun, 7 Nov 2004 10:41:04 +0000 (GMT) Received: from imap.univie.ac.at (mail.univie.ac.at [131.130.1.27]) by mx1.FreeBSD.org (Postfix) with ESMTP id E911C43D2D; Sun, 7 Nov 2004 10:41:02 +0000 (GMT) (envelope-from le@FreeBSD.org) Received: from korben.prv.univie.ac.at (korben.prv.univie.ac.at [131.130.7.98]) by imap.univie.ac.at (8.12.10/8.12.10) with ESMTP id iA7AevMx136128; Sun, 7 Nov 2004 11:40:58 +0100 Date: Sun, 7 Nov 2004 11:40:59 +0100 (CET) From: Lukas Ertl To: freebsd@newmillennium.net.au In-Reply-To: <00a701c4c466$01acd9f0$0201000a@riker> Message-ID: <20041107113342.K570@korben.prv.univie.ac.at> References: <00a701c4c466$01acd9f0$0201000a@riker> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-DCC-ZID-Univie-Metrics: mx9.univie.ac.at 4247; Body=3 Fuz1=3 Fuz2=3 cc: 'Greg 'groggy' Lehey' cc: freebsd-current@FreeBSD.org Subject: RE: Gvinum RAID5 performance X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Nov 2004 10:41:04 -0000 On Sun, 7 Nov 2004 freebsd@newmillennium.net.au wrote: > In geom_vinum_plex.c, line 575 > > /* > * RAID5 sub-requests need to come in correct order, otherwise > * we trip over the parity, as it might be overwritten by > * another sub-request. > */ > if (pbp->bio_driver1 != NULL && > gv_stripe_active(p, pbp)) { > /* Park the bio on the waiting queue. */ > pbp->bio_cflags |= GV_BIO_ONHOLD; > bq = g_malloc(sizeof(*bq), M_WAITOK | M_ZERO); > bq->bp = pbp; > mtx_lock(&p->bqueue_mtx); > TAILQ_INSERT_TAIL(&p->wqueue, bq, queue); > mtx_unlock(&p->bqueue_mtx); > } > > It seems we are holding back all requests to a currently active stripe, > even if it is just a read and would never write anything back. No, only writes are held back. pbp->bio_driver1 is NULL when it's a normal read. > 1. To calculate parity, we could simply read the old data (that was > about to be overwritten), and the old parity, and recalculate the parity > based on that information, rather than reading in all the stripes (based > on the assumption that the original parity was correct). This would > still take approximately the same amount of time, but would leave the > other disks in the stripe available for other I/O. That's how it's already done: old parity, old data is read. New parity, new data is written. > 2. If there are two or more writes pending for the same stripe (that is, > up to the point that the data|parity has been written), they should be > condensed into a single operation so that there is a single write to the > parity, rather than one write for each operation. This way, we should be > able to get close to (N -1) * disk throughput for large sequential > writes, without compromising the integrity of the parity on disk. > > 3. When calculating parity as per (2), we should operate on whole blocks > (as defined by the underlying device). This provides the benefit of > being able to write a complete block to the subdisk, so the underlying > mechanism does not have to do a read/update/write operation to write a > partial block. These are interesting ideas and I'm gonna think about it. thanks, le -- Lukas Ertl http://homepage.univie.ac.at/l.ertl/ le@FreeBSD.org http://people.freebsd.org/~le/