From owner-freebsd-questions  Tue Feb 13 21:42:40 2001
Delivered-To: freebsd-questions@freebsd.org
Received: from elmls01.ce.mediaone.net (elmls01.ce.mediaone.net [24.131.128.25])
	by hub.freebsd.org (Postfix) with ESMTP id 837B437B491
	for <freebsd-questions@FreeBSD.ORG>; Tue, 13 Feb 2001 21:42:31 -0800 (PST)
Received: from [192.168.1.100] (el01-24-131-141-229.ce.mediaone.net [24.131.141.229])
	by elmls01.ce.mediaone.net (8.8.7/8.8.7) with ESMTP id XAA19009;
	Tue, 13 Feb 2001 23:42:21 -0600 (CST)
Mime-Version: 1.0
X-Sender: dcschooley@pop.ce.mediaone.net
Message-Id: <p04320403b6afbc5f8e9e@[192.168.1.100]>
In-Reply-To: <20010212195422.S47700@wantadilla.lemis.com>
References: <p04320401b6ad1e740361@[192.168.1.100]>
 <20010212195422.S47700@wantadilla.lemis.com>
x-advocacy: An Apple a Day Keeps Windows Away
Date: Tue, 13 Feb 2001 23:41:47 -0600
To: Greg Lehey <grog@lemis.com>
From: David Schooley <dcschooley@ieee.org>
Subject: Re: Vinum behavior (long)
Cc: freebsd-questions@FreeBSD.ORG
Content-Type: text/plain; charset="us-ascii"
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

At 7:54 PM +1030 2/12/01, Greg Lehey wrote:
>On Monday, 12 February 2001 at  0:28:04 -0600, David Schooley wrote:
>> I have been doing some experimenting with vinum, primarily to
>> understand it before putting it to regular use. I have a few
>> questions, primarily due to oddities I can't explain.
>>
>> The setup consists 4 identical 30GB ATA drives, each on its own
>> channel. One pair of channels is comes off of the motherboard
>> controller; the other pair hangs off of a PCI card. I am running
>> 4.2-STABLE, cvsup'ed some time within the past week.
>>
>> The configuration file I am using is as follows and is fairly close
>> to the examples in the man page and elsewhere, although it raises
>> some questions by itself. What I attempted to do was make sure each
>> drive was mirrored to the corresponding drive on the other
>> controller, i.e., 1<->3, and 2->4:
>>
>> ***
>> drive drive1 device /dev/ad0s1d
>> drive drive2 device /dev/ad2s1d
>> drive drive3 device /dev/ad4s1d
>> drive drive4 device /dev/ad6s1d
>>
>> volume raid setupstate
>>    plex org striped 300k
>>      sd length 14655m drive drive1
>>      sd length 14655m drive drive2
>>      sd length 14655m drive drive3
>>      sd length 14655m drive drive4
>>    plex org striped 300k
>>      sd length 14655m drive drive3
>>      sd length 14655m drive drive4
>>      sd length 14655m drive drive1
>>      sd length 14655m drive drive2
>>
>> ***
>>
>> I wanted to see what would happen if I lost an entire IDE controller,
>> so I set everything up, mounted the new volume and copied over
>> everything from /usr/local. I shut the machine down, cut the power to
>> drives 3 and 4, and restarted. Upon restart, vinum reported that
>> drives 3 and 4 had failed. If my understanding is correct, then I
>> should have been OK since any data on drives 3 and 4 would have been
>> a copy of what was on drives 1 and 2, respectively.
>
>Correct.
>
>> For the next part of the test, I attempted to duplicate a directory
>> in the raid version of /usr/local. It partially worked, but there
>> there were errors
>
>What errors?

Here is part of /var/log/messages. This is at the point when I tried to write
to the RAID with two of the drives failed.

Feb 13 22:19:52 bicycle /kernel: vinum: raid.p0.s3 is stale by force
Feb 13 22:19:52 bicycle /kernel: vinum: raid.p1.s0 is stale by force
Feb 13 22:19:52 bicycle /kernel: vinum: raid.p0.s2 is stale by force
Feb 13 22:19:52 bicycle /kernel: vinum: raid.p1.s1 is stale by force
Feb 13 22:19:52 bicycle /kernel: spec_getpages:(#vinum/0) I/O read failure: (error=0) bp 0xc48b97d4 vp 0xca5bdec0
Feb 13 22:19:52 bicycle /kernel: size: 28672, resid: 28672, a_count: 28672, valid: 0x0
Feb 13 22:19:52 bicycle /kernel: nread: 0, reqpage: 0, pindex: 9, pcount: 7
Feb 13 22:19:52 bicycle /kernel: vm_fault: pager read error, pid 283 (cp)
Feb 13 22:19:52 bicycle /kernel: spec_getpages:(#vinum/0) I/O read failure: (error=0) bp 0xc48b9688 vp 0xca5bdec0
Feb 13 22:19:52 bicycle /kernel: size: 65536, resid: 65536, a_count: 65536, valid: 0x0
Feb 13 22:19:52 bicycle /kernel: nread: 0, reqpage: 0, pindex: 200, pcount: 16
Feb 13 22:19:52 bicycle /kernel: vm_fault: pager read error, pid 283 (cp)
Feb 13 22:19:52 bicycle /kernel: spec_getpages:(#vinum/0) I/O read failure: (error=0) bp 0xc48b9688 vp 0xca5bdec0
Feb 13 22:19:52 bicycle /kernel: size: 36864, resid: 36864, a_count: 36864, valid: 0x0

Here is the output of "vinum list". p0.s2, p0.s3, p1.s0, and p1.s1 are the "failed" subdisks.
I used smaller subdisks this time to keep the recovery time down during testing, but
everything else is the same as before.

D drive1                State: up       Device /dev/ad0s1d      Avail: 28287/29311 MB (96%)
D drive2                State: up       Device /dev/ad2s1d      Avail: 28287/29311 MB (96%)
D drive3                State: up       Device /dev/ad4s1d      Avail: 28287/29311 MB (96%)
D drive4                State: up       Device /dev/ad6s1d      Avail: 28287/29311 MB (96%)

1 volumes:
V raid                  State: up       Plexes:       2 Size:       2047 MB

2 plexes:
P raid.p0             S State: up       Subdisks:     4 Size:       2047 MB
P raid.p1             S State: up       Subdisks:     4 Size:       2047 MB

8 subdisks:
S raid.p0.s0            State: up       PO:        0  B Size:        511 MB
S raid.p0.s1            State: up       PO:      300 kB Size:        511 MB
S raid.p0.s2            State: up       PO:      600 kB Size:        511 MB
S raid.p0.s3            State: up       PO:      900 kB Size:        511 MB
S raid.p1.s0            State: up       PO:        0  B Size:        511 MB
S raid.p1.s1            State: up       PO:      300 kB Size:        511 MB
S raid.p1.s2            State: up       PO:      600 kB Size:        511 MB
S raid.p1.s3            State: up       PO:      900 kB Size:        511 MB

Vinum history file:
13 Feb 2001 22:02:50.424135 *** vinum started ***
13 Feb 2001 22:02:50.424819 create -f vinum2.conf
drive drive1 device /dev/ad0s1d
drive drive2 device /dev/ad2s1d
drive drive3 device /dev/ad4s1d
drive drive4 device /dev/ad6s1d

volume raid setupstate
  plex org striped 300k
    sd length 512m drive drive1
    sd length 512m drive drive2
    sd length 512m drive drive3
    sd length 512m drive drive4
  plex org striped 300k
    sd length 512m drive drive3
    sd length 512m drive drive4
    sd length 512m drive drive1
    sd length 512m drive drive2

13 Feb 2001 22:02:50.438116 *** Created devices ***
13 Feb 2001 22:15:05.884974 *** vinum started ***
13 Feb 2001 22:15:05.935591 list
13 Feb 2001 22:15:18.232052 *** vinum started ***
13 Feb 2001 22:15:19.258144 list
13 Feb 2001 22:15:25.930953 quit
13 Feb 2001 22:26:38.521465 *** vinum started ***
13 Feb 2001 22:26:39.499981 list
13 Feb 2001 22:26:53.305830 start raid.p0
13 Feb 2001 22:27:00.825452 start raid.p1
13 Feb 2001 22:27:03.218408 list


>
>> during the copy and only about two thirds of the data was
>> successfully copied.
>>
>> Question #1: Shouldn't this have worked?
>
>Answer: Yes, it should have.  What went wrong?

See above.


>
>> After I "fixed" the "broken" controller and restarted the machine,
>> vinum's list looked like this:
>>
>> ***
>> 4 drives:
>> D drive1                State: up       Device /dev/ad0s1d Avail: 1/29311 MB (0%)
>> D drive2                State: up       Device /dev/ad2s1d Avail: 1/29311 MB (0%)
>> D drive3                State: up       Device /dev/ad4s1d Avail: 1/29311 MB (0%)
>> D drive4                State: up       Device /dev/ad6s1d Avail: 1/29311 MB (0%)
>>
>> 1 volumes:
>> V raid                  State: up       Plexes:       2 Size:         57 GB
>>
>> 2 plexes:
>> P raid.p0             S State: corrupt  Subdisks:     4 Size:         57 GB
>> P raid.p1             S State: corrupt  Subdisks:     4 Size:         57 GB
>>
>> 8 subdisks:
>> S raid.p0.s0            State: up       PO:        0  B Size:         14 GB
>> S raid.p0.s1            State: up       PO:      300 kB Size:         14 GB
>> S raid.p0.s2            State: stale   PO:      600 kB Size:         14 GB
>> S raid.p0.s3            State: stale   PO:      900 kB Size:         14 GB
>> S raid.p1.s0            State: stale    PO:        0  B Size:         14 GB
>> S raid.p1.s1            State: stale    PO:      300 kB Size:         14 GB
>> S raid.p1.s2            State: up       PO:      600 kB Size:         14 GB
>> S raid.p1.s3            State: up       PO:      900 kB Size:         14 GB
>> ***
>>
>> This makes sense. Now after restarting raid.p0 and waiting for
>> everything to resync, I got this:
>>
>> ***
>> 2 plexes:
>> P raid.p0             S State: up       Subdisks:     4 Size:         57 GB
>> P raid.p1             S State: corrupt  Subdisks:     4 Size:         57 GB
>>
>> 8 subdisks:
>> S raid.p0.s0            State: up       PO:        0  B Size:         14 GB
>> S raid.p0.s1            State: up       PO:      300 kB Size:         14 GB
>> S raid.p0.s2            State: up       PO:      600 kB Size:         14 GB
>> S raid.p0.s3            State: up       PO:      900 kB Size:         14 GB
>> S raid.p1.s0            State: stale    PO:        0  B Size:	14 GB  <--- still stale
>
>Please don't wrap output.

Sorry. One of these days I'll ditch Eudora.


>
> > S raid.p1.s1            State: stale    PO:      300 kB Size:	      14 GB  <--- still stale
> > S raid.p1.s2            State: up       PO:      600 kB Size:         14 GB
>> S raid.p1.s3            State: up       PO:      900 kB Size:         14 GB
>> ***
>>
>> Now the only place that raid.p0.s2 and raid.p0.s3 could have gotten
>> their data is from raid.p1.s0 and raid.p1.s1, neither of which were
>> involved in the "event".
>
>Correct.
>
>> Question #2:  Since the data on raid.p0 now matches raid.p1,
>> shouldn't raid.p1 have come up automatically and without having to
>> copy data from raid.p0?
>
>No.  According to the output above, raid.p1 hasn't been started yet.
>There's also no indication in your message or in the output that you
>tried to start it.  If the start had died in the middle, the list
>command would have shown that.

At this point I had not started raid.p1 because I wanted to see if it would
start itself. I thought it might, since all of the drives have good
data once raid.p0 comes up. According to your vinum web pages, this
requires a logging facility that is not implemented yet. I don't
usually unplug drives to watch things break, so I won't lose
sleep over it.

<snip>

>
>Getting back to the first problem, my first guess is that you tried
>only 'start raid.p0', and didn't do a 'start raid.p1'.  If you did,
>I'd like to see the output I ask for in the man page and at
>http://www.vinumvm.org/vinum/how-to-debug.html.  It's too detailed to
>repeat here.
>

I think I have included all of the requested output. Kernel debugging
won't be possible, if necessary, until next week at the earliest.

Everything works great with raid-1; raid-0+1 works if I only pull one
drive. I don't think I have faulty hardware.

Thanks.


-- 
---------------------------------------------------
  David C. Schooley, Ph.D.
  Transmission Operations/Technical Operations Support
  Commonwealth Edison Company
  work phone: 630-691-4466/(472)-4466
  work email: mailto:david.c.schooley@ucm.com
  home email: mailto:dcschooley@ieee.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message