From owner-freebsd-questions  Sun Feb 11 22:28:17 2001
Delivered-To: freebsd-questions@freebsd.org
Received: from elmls01.ce.mediaone.net (elmls01.ce.mediaone.net [24.131.128.25])
	by hub.freebsd.org (Postfix) with ESMTP id 2668437B67D
	for <freebsd-questions@FreeBSD.ORG>; Sun, 11 Feb 2001 22:28:09 -0800 (PST)
Received: from [192.168.1.100] (el01-24-131-141-229.ce.mediaone.net [24.131.141.229])
	by elmls01.ce.mediaone.net (8.8.7/8.8.7) with ESMTP id AAA24481
	for <freebsd-questions@FreeBSD.ORG>; Mon, 12 Feb 2001 00:28:07 -0600 (CST)
Mime-Version: 1.0
X-Sender: dcschooley@pop.ce.mediaone.net
Message-Id: <p04320401b6ad1e740361@[192.168.1.100]>
x-advocacy: An Apple a Day Keeps Windows Away
Date: Mon, 12 Feb 2001 00:28:04 -0600
To: freebsd-questions@FreeBSD.ORG
From: David Schooley <dcschooley@ieee.org>
Subject: Vinum behavior (long)
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

I have been doing some experimenting with vinum, primarily to 
understand it before putting it to regular use. I have a few 
questions, primarily due to oddities I can't explain.

The setup consists 4 identical 30GB ATA drives, each on its own 
channel. One pair of channels is comes off of the motherboard 
controller; the other pair hangs off of a PCI card. I am running 
4.2-STABLE, cvsup'ed some time within the past week.

The configuration file I am using is as follows and is fairly close 
to the examples in the man page and elsewhere, although it raises 
some questions by itself. What I attempted to do was make sure each 
drive was mirrored to the corresponding drive on the other 
controller, i.e., 1<->3, and 2->4:

***
drive drive1 device /dev/ad0s1d
drive drive2 device /dev/ad2s1d
drive drive3 device /dev/ad4s1d
drive drive4 device /dev/ad6s1d

volume raid setupstate
   plex org striped 300k
     sd length 14655m drive drive1
     sd length 14655m drive drive2
     sd length 14655m drive drive3
     sd length 14655m drive drive4
   plex org striped 300k
     sd length 14655m drive drive3
     sd length 14655m drive drive4
     sd length 14655m drive drive1
     sd length 14655m drive drive2

***

I wanted to see what would happen if I lost an entire IDE controller, 
so I set everything up, mounted the new volume and copied over 
everything from /usr/local. I shut the machine down, cut the power to 
drives 3 and 4, and restarted. Upon restart, vinum reported that 
drives 3 and 4 had failed. If my understanding is correct, then I 
should have been OK since any data on drives 3 and 4 would have been 
a copy of what was on drives 1 and 2, respectively.

For the next part of the test, I attempted to duplicate a directory 
in the raid version of /usr/local. It partially worked, but there 
there were errors during the copy and only about two thirds of the 
data was successfully copied.

Question #1: Shouldn't this have worked?


After I "fixed" the "broken" controller and restarted the machine, 
vinum's list looked like this:

***
4 drives:
D drive1                State: up       Device /dev/ad0s1d 
Avail: 1/29311 MB (0%)
D drive2                State: up       Device /dev/ad2s1d 
Avail: 1/29311 MB (0%)
D drive3                State: up       Device /dev/ad4s1d 
Avail: 1/29311 MB (0%)
D drive4                State: up       Device /dev/ad6s1d 
Avail: 1/29311 MB (0%)

1 volumes:
V raid                  State: up       Plexes:       2 Size:         57 GB

2 plexes:
P raid.p0             S State: corrupt  Subdisks:     4 Size:         57 GB
P raid.p1             S State: corrupt  Subdisks:     4 Size:         57 GB

8 subdisks:
S raid.p0.s0            State: up       PO:        0  B Size:         14 GB
S raid.p0.s1            State: up       PO:      300 kB Size:         14 GB
S raid.p0.s2            State: stale   PO:      600 kB Size:         14 GB
S raid.p0.s3            State: stale   PO:      900 kB Size:         14 GB
S raid.p1.s0            State: stale    PO:        0  B Size:         14 GB
S raid.p1.s1            State: stale    PO:      300 kB Size:         14 GB
S raid.p1.s2            State: up       PO:      600 kB Size:         14 GB
S raid.p1.s3            State: up       PO:      900 kB Size:         14 GB
***

This makes sense. Now after restarting raid.p0 and waiting for 
everything to resync, I got this:

***
2 plexes:
P raid.p0             S State: up       Subdisks:     4 Size:         57 GB
P raid.p1             S State: corrupt  Subdisks:     4 Size:         57 GB

8 subdisks:
S raid.p0.s0            State: up       PO:        0  B Size:         14 GB
S raid.p0.s1            State: up       PO:      300 kB Size:         14 GB
S raid.p0.s2            State: up       PO:      600 kB Size:         14 GB
S raid.p0.s3            State: up       PO:      900 kB Size:         14 GB
S raid.p1.s0            State: stale    PO:        0  B Size: 
14 GB                <--- still stale
S raid.p1.s1            State: stale    PO:      300 kB Size: 
14 GB            <--- still stale
S raid.p1.s2            State: up       PO:      600 kB Size:         14 GB
S raid.p1.s3            State: up       PO:      900 kB Size:         14 GB
***

Now the only place that raid.p0.s2 and raid.p0.s3 could have gotten 
their data is from raid.p1.s0 and raid.p1.s1, neither of which were 
involved in the "event".

Question #2:  Since the data on raid.p0 now matches raid.p1, 
shouldn't raid.p1 have come up automatically and without having to 
copy data from raid.p0?

The configuration file below makes sense, but suffers a slight 
performance penalty over the first one.

Question #3:   Is there a reason why "mirror -s" does it this way 
instead of striping to all 4 disks?

I kind of prefer it this way, but I'm still curious.

***
drive drive1 device /dev/ad0s1d
drive drive2 device /dev/ad2s1d
drive drive3 device /dev/ad4s1d
drive drive4 device /dev/ad6s1d

volume raid setupstate
   plex org striped 300k
     sd length 29310 m drive drive1
     sd length 29310 m drive drive2
   plex org striped 300k
     sd length 29310 m drive drive3
     sd length 29310 m drive drive4
***

While reading through the archives, I noticed several occasions where 
it was stated that a power-of-two stripe size was potentially bad 
because all of the superblocks could end up on the same disk, thereby 
impacting performance, but the documentation and "mirror -s" all use 
a stripe size of 256k.

Question 4: Is the power-of-two concern still valid, and if so, 
shouldn't the documentation and "mirror -s" function be changed?

Thanks,

David.


-- 
---------------------------------------------------
   David C. Schooley, Ph.D.
   Transmission Operations/Technical Operations Support
   Commonwealth Edison Company
   work phone: 630-691-4466/(472)-4466
   work email: mailto:david.c.schooley@ucm.com
   home email: mailto:dcschooley@ieee.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message