From owner-freebsd-questions Mon Feb 12 1:24:34 2001 Delivered-To: freebsd-questions@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 7FDE937B401 for ; Mon, 12 Feb 2001 01:24:25 -0800 (PST) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id 4461A6ACAF; Mon, 12 Feb 2001 19:54:22 +1030 (CST) Date: Mon, 12 Feb 2001 19:54:22 +1030 From: Greg Lehey To: David Schooley Cc: freebsd-questions@FreeBSD.ORG Subject: Re: Vinum behavior (long) Message-ID: <20010212195422.S47700@wantadilla.lemis.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from dcschooley@ieee.org on Mon, Feb 12, 2001 at 12:28:04AM -0600 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Monday, 12 February 2001 at 0:28:04 -0600, David Schooley wrote: > I have been doing some experimenting with vinum, primarily to > understand it before putting it to regular use. I have a few > questions, primarily due to oddities I can't explain. > > The setup consists 4 identical 30GB ATA drives, each on its own > channel. One pair of channels is comes off of the motherboard > controller; the other pair hangs off of a PCI card. I am running > 4.2-STABLE, cvsup'ed some time within the past week. > > The configuration file I am using is as follows and is fairly close > to the examples in the man page and elsewhere, although it raises > some questions by itself. What I attempted to do was make sure each > drive was mirrored to the corresponding drive on the other > controller, i.e., 1<->3, and 2->4: > > *** > drive drive1 device /dev/ad0s1d > drive drive2 device /dev/ad2s1d > drive drive3 device /dev/ad4s1d > drive drive4 device /dev/ad6s1d > > volume raid setupstate > plex org striped 300k > sd length 14655m drive drive1 > sd length 14655m drive drive2 > sd length 14655m drive drive3 > sd length 14655m drive drive4 > plex org striped 300k > sd length 14655m drive drive3 > sd length 14655m drive drive4 > sd length 14655m drive drive1 > sd length 14655m drive drive2 > > *** > > I wanted to see what would happen if I lost an entire IDE controller, > so I set everything up, mounted the new volume and copied over > everything from /usr/local. I shut the machine down, cut the power to > drives 3 and 4, and restarted. Upon restart, vinum reported that > drives 3 and 4 had failed. If my understanding is correct, then I > should have been OK since any data on drives 3 and 4 would have been > a copy of what was on drives 1 and 2, respectively. Correct. > For the next part of the test, I attempted to duplicate a directory > in the raid version of /usr/local. It partially worked, but there > there were errors What errors? > during the copy and only about two thirds of the data was > successfully copied. > > Question #1: Shouldn't this have worked? Answer: Yes, it should have. What went wrong? > After I "fixed" the "broken" controller and restarted the machine, > vinum's list looked like this: > > *** > 4 drives: > D drive1 State: up Device /dev/ad0s1d Avail: 1/29311 MB (0%) > D drive2 State: up Device /dev/ad2s1d Avail: 1/29311 MB (0%) > D drive3 State: up Device /dev/ad4s1d Avail: 1/29311 MB (0%) > D drive4 State: up Device /dev/ad6s1d Avail: 1/29311 MB (0%) > > 1 volumes: > V raid State: up Plexes: 2 Size: 57 GB > > 2 plexes: > P raid.p0 S State: corrupt Subdisks: 4 Size: 57 GB > P raid.p1 S State: corrupt Subdisks: 4 Size: 57 GB > > 8 subdisks: > S raid.p0.s0 State: up PO: 0 B Size: 14 GB > S raid.p0.s1 State: up PO: 300 kB Size: 14 GB > S raid.p0.s2 State: stale PO: 600 kB Size: 14 GB > S raid.p0.s3 State: stale PO: 900 kB Size: 14 GB > S raid.p1.s0 State: stale PO: 0 B Size: 14 GB > S raid.p1.s1 State: stale PO: 300 kB Size: 14 GB > S raid.p1.s2 State: up PO: 600 kB Size: 14 GB > S raid.p1.s3 State: up PO: 900 kB Size: 14 GB > *** > > This makes sense. Now after restarting raid.p0 and waiting for > everything to resync, I got this: > > *** > 2 plexes: > P raid.p0 S State: up Subdisks: 4 Size: 57 GB > P raid.p1 S State: corrupt Subdisks: 4 Size: 57 GB > > 8 subdisks: > S raid.p0.s0 State: up PO: 0 B Size: 14 GB > S raid.p0.s1 State: up PO: 300 kB Size: 14 GB > S raid.p0.s2 State: up PO: 600 kB Size: 14 GB > S raid.p0.s3 State: up PO: 900 kB Size: 14 GB > S raid.p1.s0 State: stale PO: 0 B Size: 14 GB <--- still stale Please don't wrap output. > S raid.p1.s1 State: stale PO: 300 kB Size: 14 GB <--- still stale > S raid.p1.s2 State: up PO: 600 kB Size: 14 GB > S raid.p1.s3 State: up PO: 900 kB Size: 14 GB > *** > > Now the only place that raid.p0.s2 and raid.p0.s3 could have gotten > their data is from raid.p1.s0 and raid.p1.s1, neither of which were > involved in the "event". Correct. > Question #2: Since the data on raid.p0 now matches raid.p1, > shouldn't raid.p1 have come up automatically and without having to > copy data from raid.p0? No. According to the output above, raid.p1 hasn't been started yet. There's also no indication in your message or in the output that you tried to start it. If the start had died in the middle, the list command would have shown that. > The configuration file below makes sense, but suffers a slight > performance penalty over the first one. > > Question #3: Is there a reason why "mirror -s" does it this way > instead of striping to all 4 disks? Yes. mirror -s is a pretty bare bones config utility. You have so many different options with Vinum, and mirror just does one of them. > I kind of prefer it this way, but I'm still curious. You're better off with the first config. Your performance will be more even. > drive drive1 device /dev/ad0s1d > drive drive2 device /dev/ad2s1d > drive drive3 device /dev/ad4s1d > drive drive4 device /dev/ad6s1d > > volume raid setupstate > plex org striped 300k > sd length 29310 m drive drive1 > sd length 29310 m drive drive2 > plex org striped 300k > sd length 29310 m drive drive3 > sd length 29310 m drive drive4 > *** > > While reading through the archives, I noticed several occasions where > it was stated that a power-of-two stripe size was potentially bad > because all of the superblocks could end up on the same disk, thereby > impacting performance, but the documentation and "mirror -s" all use > a stripe size of 256k. > > Question 4: Is the power-of-two concern still valid, and if so, > shouldn't the documentation and "mirror -s" function be changed? Yes. Getting back to the first problem, my first guess is that you tried only 'start raid.p0', and didn't do a 'start raid.p1'. If you did, I'd like to see the output I ask for in the man page and at http://www.vinumvm.org/vinum/how-to-debug.html. It's too detailed to repeat here. Greg -- When replying to this message, please copy the original recipients. If you don't, I may ignore the reply. For more information, see http://www.lemis.com/questions.html Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message