From owner-freebsd-questions Sun Feb 11 22:28:17 2001 Delivered-To: freebsd-questions@freebsd.org Received: from elmls01.ce.mediaone.net (elmls01.ce.mediaone.net [24.131.128.25]) by hub.freebsd.org (Postfix) with ESMTP id 2668437B67D for ; Sun, 11 Feb 2001 22:28:09 -0800 (PST) Received: from [192.168.1.100] (el01-24-131-141-229.ce.mediaone.net [24.131.141.229]) by elmls01.ce.mediaone.net (8.8.7/8.8.7) with ESMTP id AAA24481 for ; Mon, 12 Feb 2001 00:28:07 -0600 (CST) Mime-Version: 1.0 X-Sender: dcschooley@pop.ce.mediaone.net Message-Id: x-advocacy: An Apple a Day Keeps Windows Away Date: Mon, 12 Feb 2001 00:28:04 -0600 To: freebsd-questions@FreeBSD.ORG From: David Schooley Subject: Vinum behavior (long) Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I have been doing some experimenting with vinum, primarily to understand it before putting it to regular use. I have a few questions, primarily due to oddities I can't explain. The setup consists 4 identical 30GB ATA drives, each on its own channel. One pair of channels is comes off of the motherboard controller; the other pair hangs off of a PCI card. I am running 4.2-STABLE, cvsup'ed some time within the past week. The configuration file I am using is as follows and is fairly close to the examples in the man page and elsewhere, although it raises some questions by itself. What I attempted to do was make sure each drive was mirrored to the corresponding drive on the other controller, i.e., 1<->3, and 2->4: *** drive drive1 device /dev/ad0s1d drive drive2 device /dev/ad2s1d drive drive3 device /dev/ad4s1d drive drive4 device /dev/ad6s1d volume raid setupstate plex org striped 300k sd length 14655m drive drive1 sd length 14655m drive drive2 sd length 14655m drive drive3 sd length 14655m drive drive4 plex org striped 300k sd length 14655m drive drive3 sd length 14655m drive drive4 sd length 14655m drive drive1 sd length 14655m drive drive2 *** I wanted to see what would happen if I lost an entire IDE controller, so I set everything up, mounted the new volume and copied over everything from /usr/local. I shut the machine down, cut the power to drives 3 and 4, and restarted. Upon restart, vinum reported that drives 3 and 4 had failed. If my understanding is correct, then I should have been OK since any data on drives 3 and 4 would have been a copy of what was on drives 1 and 2, respectively. For the next part of the test, I attempted to duplicate a directory in the raid version of /usr/local. It partially worked, but there there were errors during the copy and only about two thirds of the data was successfully copied. Question #1: Shouldn't this have worked? After I "fixed" the "broken" controller and restarted the machine, vinum's list looked like this: *** 4 drives: D drive1 State: up Device /dev/ad0s1d Avail: 1/29311 MB (0%) D drive2 State: up Device /dev/ad2s1d Avail: 1/29311 MB (0%) D drive3 State: up Device /dev/ad4s1d Avail: 1/29311 MB (0%) D drive4 State: up Device /dev/ad6s1d Avail: 1/29311 MB (0%) 1 volumes: V raid State: up Plexes: 2 Size: 57 GB 2 plexes: P raid.p0 S State: corrupt Subdisks: 4 Size: 57 GB P raid.p1 S State: corrupt Subdisks: 4 Size: 57 GB 8 subdisks: S raid.p0.s0 State: up PO: 0 B Size: 14 GB S raid.p0.s1 State: up PO: 300 kB Size: 14 GB S raid.p0.s2 State: stale PO: 600 kB Size: 14 GB S raid.p0.s3 State: stale PO: 900 kB Size: 14 GB S raid.p1.s0 State: stale PO: 0 B Size: 14 GB S raid.p1.s1 State: stale PO: 300 kB Size: 14 GB S raid.p1.s2 State: up PO: 600 kB Size: 14 GB S raid.p1.s3 State: up PO: 900 kB Size: 14 GB *** This makes sense. Now after restarting raid.p0 and waiting for everything to resync, I got this: *** 2 plexes: P raid.p0 S State: up Subdisks: 4 Size: 57 GB P raid.p1 S State: corrupt Subdisks: 4 Size: 57 GB 8 subdisks: S raid.p0.s0 State: up PO: 0 B Size: 14 GB S raid.p0.s1 State: up PO: 300 kB Size: 14 GB S raid.p0.s2 State: up PO: 600 kB Size: 14 GB S raid.p0.s3 State: up PO: 900 kB Size: 14 GB S raid.p1.s0 State: stale PO: 0 B Size: 14 GB <--- still stale S raid.p1.s1 State: stale PO: 300 kB Size: 14 GB <--- still stale S raid.p1.s2 State: up PO: 600 kB Size: 14 GB S raid.p1.s3 State: up PO: 900 kB Size: 14 GB *** Now the only place that raid.p0.s2 and raid.p0.s3 could have gotten their data is from raid.p1.s0 and raid.p1.s1, neither of which were involved in the "event". Question #2: Since the data on raid.p0 now matches raid.p1, shouldn't raid.p1 have come up automatically and without having to copy data from raid.p0? The configuration file below makes sense, but suffers a slight performance penalty over the first one. Question #3: Is there a reason why "mirror -s" does it this way instead of striping to all 4 disks? I kind of prefer it this way, but I'm still curious. *** drive drive1 device /dev/ad0s1d drive drive2 device /dev/ad2s1d drive drive3 device /dev/ad4s1d drive drive4 device /dev/ad6s1d volume raid setupstate plex org striped 300k sd length 29310 m drive drive1 sd length 29310 m drive drive2 plex org striped 300k sd length 29310 m drive drive3 sd length 29310 m drive drive4 *** While reading through the archives, I noticed several occasions where it was stated that a power-of-two stripe size was potentially bad because all of the superblocks could end up on the same disk, thereby impacting performance, but the documentation and "mirror -s" all use a stripe size of 256k. Question 4: Is the power-of-two concern still valid, and if so, shouldn't the documentation and "mirror -s" function be changed? Thanks, David. -- --------------------------------------------------- David C. Schooley, Ph.D. Transmission Operations/Technical Operations Support Commonwealth Edison Company work phone: 630-691-4466/(472)-4466 work email: mailto:david.c.schooley@ucm.com home email: mailto:dcschooley@ieee.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message