Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jul 2007 12:36:51 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Alexander Leidinger <Alexander@Leidinger.net>
Cc:        Ulf Lilleengen <lulf@FreeBSD.org>, perforce@freebsd.org, Eric Anderson <anderson@freebsd.org>
Subject:   Re: PERFORCE change 123662 for review
Message-ID:  <20070723103651.GD5456@garage.freebsd.pl>
In-Reply-To: <20070720150716.77d2636a@deskjail>
References:  <200707172109.l6HL9PMJ078780@repoman.freebsd.org> <46A03390.3030602@freebsd.org> <20070720123524.GA71360@twoflower.idi.ntnu.no> <20070720150716.77d2636a@deskjail>

next in thread | previous in thread | raw e-mail | index | archive | help

--NtwzykIc2mflq5ck
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 20, 2007 at 03:07:16PM +0200, Alexander Leidinger wrote:
> Quoting Ulf Lilleengen <lulf@FreeBSD.org> (Fri, 20 Jul 2007 14:35:24 +020=
0):
>=20
> [growing RAID-5]
> > Well, what I do is to attach/create the new subdisk as usual, but since=
 it's a
> > RAID-5 array that I know is operational, I give the subdisk a flag, and=
 sets the
> > plex in a resize state. Then, In the raid-5 code, I modify gv_raid5_off=
set
> > (which basically computes offsets within a subdisk based on the number =
of
> > subdisks and stripesize). However, what I do, is that instead of taking=
 all
> > subdisks in the calculation, I only take those who does not have the GR=
OW flag
> > (when reading), and I take all subdisks into calculation when it's a wr=
ite.
> >=20
> > This means, that if a create a gv_grow_plex function that reads (stripe=
size x
> > sdcount) bytes (from the subdisks that do not have the GROW flag), and =
writes
> > that data to the plex (including all subdisks). This way, i sort of ove=
rwrite
> > the old data, but the data is spread out over the new subdisks. I'm sor=
ry if
> > this might seem a bit complex, but just ask more questions if you didn't
> > understand.
>=20
> Do you use the additional drive(s) only to write checksums to them, or
> do you write real data to it? If the later, how do you make sure you
> read the right data in case you read data again, which was just written
> there a moment before (how do you know to read from all subdisks and
> not only from a subset in this case)?

You only need to move offset while you synchronize new disk.

When you start you have:

	Disk0	Disk1	Disk2	NewDisk

	D0	D1	P0	U
	D2	P1	D3	U
	P2	D4	D5	U
	D6	D7	P3	U
	D8	P4	D9	U
	P5	D10	D11	U

After some time you have:

	Disk0	Disk1	Disk2	NewDisk

	D0	D1	D2	NP0
	D3	D4	NP1	D5
	U	U	U	U
-->	D6	D7	P3	U
	D8	P4	D9	U
	P5	D10	D11	U

And at the end you have:

	Disk0	Disk1	Disk2	NewDisk

	D0	D1	D2	NP0
	D3	D4	NP1	D5
	D6	NP2	D7	D8
	NP3	D9	D10	D11
	U	U	U	U
	U	U	U	U

Where:
D<x> - data block
P<x> - parity block
NP<x> - new parity block
U - unused
--> - if offset in I/O request is below that point, you use four disks,
      if it is above that point you use only three disks

BTW. Such functionality is really cool.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--NtwzykIc2mflq5ck
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGpITDForvXbEpPzQRAlsiAJ9f5lTZ188mFypz1uO4+ltEb6QMjgCeO45h
5hgG+gHHAH8G3dWg2LNr9Bw=
=cOgP
-----END PGP SIGNATURE-----

--NtwzykIc2mflq5ck--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070723103651.GD5456>