From owner-freebsd-questions@FreeBSD.ORG Sun Jul 27 00:55:51 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C9A9106564A for ; Sun, 27 Jul 2008 00:55:51 +0000 (UTC) (envelope-from freebsd-questions@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id E5FF18FC1C for ; Sun, 27 Jul 2008 00:55:50 +0000 (UTC) (envelope-from freebsd-questions@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1KMuY9-0005Ff-0t for freebsd-questions@freebsd.org; Sun, 27 Jul 2008 00:55:49 +0000 Received: from 89-172-38-237.adsl.net.t-com.hr ([89.172.38.237]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 27 Jul 2008 00:55:49 +0000 Received: from ivoras by 89-172-38-237.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 27 Jul 2008 00:55:49 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-questions@freebsd.org From: Ivan Voras Date: Sun, 27 Jul 2008 02:55:36 +0200 Lines: 98 Message-ID: References: <20080725114402.G5386@wojtek.tensor.gdynia.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig63630725DBF5A87B69DB818C" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 89-172-38-237.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) In-Reply-To: <20080725114402.G5386@wojtek.tensor.gdynia.pl> X-Enigmail-Version: 0.95.6 Sender: news Subject: Re: graid3 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Jul 2008 00:55:51 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig63630725DBF5A87B69DB818C Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Wojciech Puchar wrote: > i read the graid3 manual and http://www.acnc.com/04_01_03.html to make = > sure i know what's RAID3 and i don't understand few things. >=20 > 1) >=20 > "The number of components must be equal to 3, 5, 9, 17, etc. > (2^n + 1)." >=20 > why it can't be say 5 disks+parity? The reason is in the definition on "RAID 3", which says the updates to=20 the RAID device must be atomic. In some ideal universe, RAID 3 is=20 implemented in hardware and on individual bytes, but here we cannot=20 write to the drives in units other than sectorsize and sectorsize is 512 = bytes. Parity needs to be calculated with regards to each sector, so at the=20 sector level, the minimum number of sectors is three sectors: two for=20 data and one for parity. This means the high-level atomic sectorsize is=20 2*512=3D1024 bytes. If you inspect your RAID 3 devices, you'll see just t= hat: # diskinfo -v /dev/raid3/homes /dev/raid3/homes 1024 # sectorsize 107374181376 # mediasize in bytes (100G) 104857599 # mediasize in sectors But each drive has a normal sectorsize of 512: # diskinfo -v /dev/ad4 /dev/ad4 512 # sectorsize 80026361856 # mediasize in bytes (75G) 156301488 # mediasize in sectors Sector sizes cannot be arbitrary for various reasons, mostly dealing=20 with how memory pages and virtual memory are managed. In short, they=20 need to be powers of two. This restricts us to high-level ("big") sector = sizes that can be exactly one of the following values: 1024, 2048, 4096, = 8192, etc. Since drive sectors are fixed to 512 bytes, this means that=20 the number of *data* drives must also be a power of two: 2, 4, 8, 16,=20 etc. Add one more drive for the parity and you get the starting=20 sequence: 3, 5, 9, 17. In practice, this means that if you have 17 drives in RAID3, the=20 sectorsize of the array itself will be 16*512 =3D 8192. Each write to the= =20 array will update all 17 drives before returning (one sector on each=20 drive, ensuring an atomic operation). Note that the file system created=20 on such an array will also have its characteristics modified to the=20 sector size (the fragment size will be the sector size). > 2) "-r Use parity component for reading in round-robin fashion. > "Without this option the parity component is not used at > all for reading operations when the device is in a complete state. > With this option specified random I/O read operations are even 40% fas= ter > , but sequential reads are slower. One cannot use this option if the -= w=20 > option is also specified." >=20 >=20 > how parity disk could speed up random I/O? It will work well only when the number of drives is small (i.e. three=20 drives), by using the parity drive as a valid source of data, avoiding=20 some seeks to all drives. I think that, theoretically, you can save at=20 most 0.33 (1/3) of all seeks - I don't know where the 40% number comes fr= om. --------------enig63630725DBF5A87B69DB818C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIi8eJldnAQVacBcgRAmfQAKCRMuPfeZdLbi1GeVZmb3H8JgY6SwCgmOnU od/i6cQGCMEqMgGT84himXM= =WSbr -----END PGP SIGNATURE----- --------------enig63630725DBF5A87B69DB818C--