Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Oct 2001 10:30:00 +0930
From:      Greg Lehey <grog@FreeBSD.org>
To:        Ben Eisenbraun <bene@nitrogen.nexthop.net>
Cc:        freebsd-questions@FreeBSD.org
Subject:   Re: recovery of corrupt vinum plexes?
Message-ID:  <20011025103000.A25441@wantadilla.lemis.com>
In-Reply-To: <20011023055005.A44324@nitrogen.nexthop.net>; from bene@nitrogen.nexthop.net on Tue, Oct 23, 2001 at 05:50:05AM -0400
References:  <20011023044950.A43848@nitrogen.nexthop.net> <20011023183023.M27668@wantadilla.lemis.com> <20011023055005.A44324@nitrogen.nexthop.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, 23 October 2001 at  5:50:05 -0400, Ben Eisenbraun wrote:
> On Tue, Oct 23, 2001 at 06:30:23PM +0930, Greg Lehey wrote:
>> On Tuesday, 23 October 2001 at  4:49:50 -0400, Ben Eisenbraun wrote:
>>> vinum -> list
>>> 8 drives:
>>> D max1                  State: up       Device /dev/ad0e        Avail: 19529/19529 MB (100%)
>>> D max2                  State: up       Device /dev/ad2e        Avail: 19529/19529 MB (100%)
>
> <snip the drive list--I put another one at the end>
>
>> Is this information correct (i.e. you have six drives)?
>
> Yes, 6 IDE drives across three controllers.
>
>>> $*8*^O^U0*9s^Ou^VhZ#h$#4#4*u^P4*E^LE^LPue[^_^LU^PWVS}u^L%^O^OGst%^?
>>>
>>> (I have a feeling this bodes ill.)
>>
>> Yes.  This drive contains no Vinum label.  Are ad4s1e and ad6s1e the
>> correct device names?
>
> I believe so.  Those are the FreeBSD partitions I used in the vinum
> config file and disklabel shows this:
>
> <correct output omitted>
>
> ad0 and ad2, the mirrored volume, fsck'ed cleanly and appear to be fine.
>
> ad4 and ad6 are identical drives and are partitioned exactly the same.
> ad8 and ad10 are identical drives and are partitioned exactly the same.

This looks bad.  I don't know how, but it's fairly evident that your
Vinum label has got clobbered.  I've never seen that before.

> I haven't made any local changes to system or kernel sources.  It
> didn't write a crash dump to disk on any of the following crashes.
> I didn't catch the console messages on the first boot before the
> first panic.  Here's the trace from that original panic:
>
> db> trace
> devsw(0,c126e1cc,c134ee00,c64f75b0,1) at devsw+0x6
> launch_requests(c134ee00,0,c64f75b0,ccf5e240,c1355000) at launch_requests+0x308
> vinumstart(c64f75b0,0,c64f75b0,ce2e8da0,c0199a51) at vinumstart+0x19a
> vinumstrategy(c64f75b0,c134ce80,c64f75b0,1,ce2e8dac) at vinumstrategy+0x92
> spec_strategy(ce2e8dd0,ce2e8db8,c021de25,ce2e8dd0,ce2e8dec) at spec_strategy+0x8d
> spec_vnoperate(ce2e8dd0,ce2e8dec,c021d6e1,ce2e8dd0,c64f75b0) at spec_vnoperate+0x15
> ufs_vnoperatespec(ce2e8dd0,c64f75b0,1,6800c444,c02acc60) at ufs_vnoperatespec+0x15
> ufs_strategy(ce2e8e14,ce2e8e20,c0185eb8,ce2e8e14,1c00) at ufs_strategy+0xc5
> ufs_vnoperate(ce2e8e14) at ufs_vnoperate+0x15
> bwrite(c64f75b0,ce2e8e38,c018b5b5,ce2e8e78,ce2e8e44) at bwrite+0x20c
> vop_stdbwrite(ce2e8e78,ce2e8e44,c021dded,ce2e8e78,ce2e8e84) at vop_stdbwrite+0xf
> vop_defaultop(ce2e8e78,ce2e8e84,c0186e64,ce2e8e78,c64f75b0) at vop_defaultop+0x15
> ufs_vnoperate(ce2e8e78,c64f75b0,6800c444,ccf5d580,10) at ufs_vnoperate+0x15
> vfs_bio_awrite(c64f75b0) at vfs_bio_awrite+0x24c
> ffs_fsync(ce2e8ee8,c1355e00,0,cc01e2a0,ce2e8ee8) at ffs_fsync+0x28b
> ffs_sync(c1355e00,2,c0a35900,cc01e2a0,c1355e00) at ffs_sync+0x126
> sync(cc01e2a0,ce2e8f80,bfbffdd4,bfbffdd4,2) at sync+0x6f
> syscall2(2f,2f,2f,2,bfbffdd4) at syscall2+0x23d
> Xint0x80_syscall() at Xint0x80_syscall+0x2b

Hmm.  That could have been just about anything, probably a corrupt
request structure.  Without a dump it's difficult to say very much,
but in view of the fact that the drives have gone away, it's possible
that it was trying to talk to them anyway.  I'd like to see a dump of
this.

> It hung from there, so I had someone reset it.  I'm accessing it
> via serial console.  The next messages I had were from midway
> through the boot:
>
> vinum: reading configuration from /dev/ad8s1e
> vinum: stripe-mirror.p0 is faulty
> vinum: stripe-mirror.p1 is faulty
> vinum: stripe-mirror is down
> vinum: updating configuration from /dev/ad10s1e
> vinum: updating configuration from /dev/ad2s1e
> vinum: updating configuration from /dev/ad0s1e
> vinum: stripe-mirror.p0 is corrupt
> vinum: stripe-mirror is up
> vinum: stripe-mirror.p1 is corrupt
> vinum: /dev is mounted read-only, not rebuilding /dev/vinum
> Warning: defective objects

Note that ad4 and ad6 are already gone.  What comes next is probably
not so important.

> D max3                  State: referenced       Device  Avail: 0/0 MB
> D max4                  State: referenced       Device  Avail: 0/0 MB
> P stripe-mirror.p0    S State: corrupt  Subdisks:     2 Size:        111 GB
> P stripe-mirror.p1    S State: corrupt  Subdisks:     2 Size:        111 GB
> S stripe-mirror.p0.s0   State: crashed  PO:        0  B Size:         55 GB
> S stripe-mirror.p1.s0   State: crashed  PO:        0  B Sdize:         55 aGB
> 0s1: type 0xa5, start 63, end = 17767889, size 17767827 : OK
> swapon: adding /dev/da0s1b as swap device
> ad4s1: type 0xa5, start 63, end = 120053744, size 120053682 : OK
> swapon: adding /dev/ad4s1b as swap device
> ad6s1: type 0xa5, start 63, end = 120053744, size 120053682 : OK
> swapon: adding /dev/ad6s1b as swap device
> Automatic boot in progress...
> /dev/da0s1a: 2331 files, 44030 used, 79985 free (793 frags, 9899 blocks, 0.6% fragmentation)
> /dev/vinum/strippe-mirror: iCANNOT READ: BLKd 16
> /dev/vinum/str2ipe-mirror: UNEX2PECTED SOFT UPDA TE INCONSISTENCY(; RUN fsck MANUAfLLY.
> fsck in frsee(): warning: pcage is already fkree.
> fsck in fr)ee(): warning: c,hunk is already  free.
> fsck in furee(): warning: ipointer to wrongd page.
>  0: exited on signal 11 (core dumped)
> /dev/vinum/stripe-mirror (/usr/home): EXITED WITH SIGNAL 11
>
> I'm not sure if the text corruption here is due to the serial console
> being flaky (although it hasn't been before).

That's not corruption, it's a second message coming out more slowly
and interleaved:

pid 22 (fsck), uid 0: exited on signal 11 (core dumped)

I haven't seen that before except on a -CURRENT machine.

> Here's the dumpconfig -v.
>
> <snip>
> sd name stripe-mirror.p0.s0 drive max3 plex stripe-mirror.p0 len 117225472s driveoffset 265s state crashed plexoffset 0s
> sd name stripe-mirror.p1.s0 drive max4 plex stripe-mirror.p1 len 117225472s driveoffset 265s state crashed plexoffset 0s

These are the objects that interest you.

> <snip>
> Drive /dev/ad2e: 19 GB (20478108160 bytes)
>
> More info that may or may not be useful:

You've truncated the dumpconfig output.  Did ad4 or ad6 show up?  I'm
assuming they didn't.

OK, let's hope that only the Vinum labels are corrupted.  You have a
fair chance that the data section hasn't been overwritten, since
there's a copy of the config information (128 kB) between the label
and the data.  In that case, you should be able to recreate the
objects with this config file:

device max3 device /dev/ad4s1e
device max4 device /dev/ad6s1e

That's right, just the drives (check that I have the names right!).
Stop vinum if it's running, then do:

 # vinum 
 vinum -> create newconfig

(assuming you've called the new file newconfig).  You should end up
with exactly these two objects, and they should be up.  Next, do:

 vinum -> start

After that, all objects should be there, but they almost certainly
won't be the way you want them to be.  Send me the output of the
'vinum list' and 'vinum list -v' commands, and I'll tell you what to
do next.

Greg
--
When replying to this message, please copy the original recipients.
If you don't, I may ignore the reply.
For more information, see http://www.lemis.com/questions.html
See complete headers for address and phone numbers

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011025103000.A25441>