Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 May 2000 15:19:58 +0200
From:      "Niels Chr. Bank-Pedersen" <ncbp@bank-pedersen.dk>
To:        Greg Lehey <grog@lemis.com>, Bernd Walter <ticso@cicely.de>
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re: Possible Vinum RAID-5 problems? (was: panic: ffs_valloc: dup alloc)
Message-ID:  <20000519151958.C30889@bank-pedersen.dk>
In-Reply-To: <20000519182044.A47558@freebie.lemis.com>; from grog@lemis.com on Fri, May 19, 2000 at 06:20:44PM %2B0930
References:  <20000518161343.G26090@bank-pedersen.dk> <20000518232151.A30272@cicely5.cicely.de> <20000518234358.B28600@bank-pedersen.dk> <20000519000159.B30272@cicely5.cicely.de> <20000519062438.A29755@bank-pedersen.dk> <20000519075536.A31215@cicely5.cicely.de> <20000519182044.A47558@freebie.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, May 19, 2000 at 06:20:44PM +0930, Greg Lehey wrote:
> I only stumbled on this thread by accident.  If you have problems
> which involve Vinum, please copy me, and I may have input.

Ok, thanks.  Wasn't at all sure vinum had anything to do with this
crash (and I still don't know if it has), so I didn't want to "Cry
Wolf"  :-)  .

> On Friday, 19 May 2000 at  7:55:37 +0200, Bernd Walter wrote:
> > On Fri, May 19, 2000 at 06:24:39AM +0200, Niels Chr. Bank-Pedersen wrote:
> >> On Fri, May 19, 2000 at 12:01:59AM +0200, Bernd Walter wrote:
> >>> On Thu, May 18, 2000 at 11:43:58PM +0200, Niels Chr. Bank-Pedersen wrote:
> >>>> On Thu, May 18, 2000 at 11:21:51PM +0200, Bernd Walter wrote:
> >>>>> Does vinum list saying that one subdisk of your R5 volume is down?
> >>>>
> >>>> 5 subdisks:
> >>>> S raid5.p0.s0           State: up	PO:        0  B Size:       4133 MB
> >>>> S raid5.p0.s1           State: up	PO:      768 kB Size:       4133 MB
> >>>> S raid5.p0.s2           State: up	PO:     1536 kB Size:       4133 MB
> >>>> S raid5.p0.s3           State: up	PO:     2304 kB Size:       4133 MB
> >>>> S raid5.p0.s4           State: up	PO:     3072 kB Size:       4133 MB
> >>>>
> >>>>  - But since my first attempt to initialize the plex crashed the
> >>>> box while only da5se1 was missing, I did a "verify media" from
> >>>> the SCSI ctrl. BIOS, and did find errors. They were all successfully
> >>>> remapped, though.
> >>>
> >>> I thought about a parity corrpution bug that there was in history.
> >>> But as all drives are up and they are freshly initialized there are 2
> >>> arguments why your problems should be different.
> >>> Maybe one drive crashed and the system paniced before vinum was able to update
> >>> the state database on the drives.
> >>> Did you saw anything unusual on the console before the panic message?
> >>
> >> <snip>
> >>  - after obliterating the vinum configuration I created the volume
> >> without da5, and I have now successfully copied 250+MB to the filesystem.
> >>
> >> I would have thought that hardware errors like this could be handled
> >> in a slightly more controlled manner, though.
> >
> > At this moment there is no sign of a hardware error.
> > The panic itself means that the filessytem allocated a block which
> > already was allocated. Usually it means that the data on the drive
> > got corrupted while mounted.
> >
> > You should be carefully testing the volume before copying important
> > data to it.  In my expirience I can say that using softupdates
> > stresses the I/O system much more than the standard way so it makes
> > sense to test with softupdates.

This is a scratch box used primarily for testing -current, so I
don't have anything important on it.  Which is good, 'cause as
you suspected, it *did* crash again after a while.  This time it
was, among other things, nfs-serving a buildworld from another
filesystem at the time of the crash, so I'm not too sure what
triggered it this time (I only got the trace, not any of the
messages from the panic):

db> trace
Debugger(c0245ca3) at Debugger+0x35
panic(c0253e00,c3853a40,c1259500,c3853a40,0) at panic+0x70
handle_written_inodeblock(c1259500,c3853a40) at handle_written_inodeblock+0x2b8
softdep_disk_write_complete(c3853a40) at softdep_disk_write_complete+0x6a
bufdone(c3853a40,c0264ab8,c0128558,c3853a40,c0f78400) at bufdone+0x7e
bufdonebio(c3853a40) at bufdonebio+0xe
dadone(c0f6c680,c0f78400,c073a9a0,40000200,ffffffff) at dadone+0x210
camisr(c0282290,c0264b0c,c021e4e0,40000200,c022e8f6) at camisr+0x1eb
swi_cambio(40000200,c022e8f6,c022e41f,40000200,c0ec3800) at swi_cambio+0xd
splz_swi(c073a9a0,0,10,10,10) at splz_swi+0x14
Xresume9() at Xresume9+0x2b
--- interrupt, eip = 0xc0227a96, esp = 0xc0264b54, ebp = 0 ---
default_halt() at default_halt+0x2
db>

> I recently fixed a number of problems in RAID-5.  Can you give me
> details of the problems you've been having, and the date of the sup?
> Since this is -CURRENT, I assume that's the version of the system too.

It is - sources from ~3 days ago, so I believe I have all your latest
fixes.

> As far as soft updates goes, basically it's incompatible with Vinum,
> since there's currently no way of ensuring the sequence of writes
> across a number of disks.  I'm thinking of ways of doing it, but they
> will cause significant loss in performance.  There should be no
> problems as long as there isn't a crash, of course :-)

I wasn't aware of the incompatibility issues, so all my vinum volumes
were running with SU until a few minutes ago :-)
I have turned SU off on this R5 volume now as well, but the above trace
happened, with SU turned on.


> Greg


/Niels Chr.

-- 
 Niels Christian Bank-Pedersen, NCB1-RIPE.
 Network Manager, Tele Danmark NET, IP-section.

 "Hey, are any of you guys out there actually *using* RFC 2549?"


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000519151958.C30889>