Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 01 Dec 1999 15:50:05 -0500
From:      M a t a d o r <bullfighter@home.com>
To:        "Kenneth D. Merry" <ken@kdm.org>
Cc:        David Gilbert <dgilbert@velocet.ca>, stable@FreeBSD.ORG
Subject:   Re: vinum experiences.
Message-ID:  <384589FD.14CCA8BB@home.com>
References:  <199912011806.LAA43219@panzer.kdm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> > While I'm still chasing the memory corruption bug in vinum, I have a
> > couple of observations.
> >
> > 1. Removing a device (at least, with the ahc controller) locks the bus
> > even though I have a RAID hot-swap ready chassy (that properly
> > isolates the bus between commands).  In my test, I had a completely
> > quiet SCSI bus when I removed one of the drives.  I then wrote to the
> > RAID array.  I got:
> >
> > Nov 30 18:31:51 raid1 /kernel: (da8:ahc1:0:11:0): Invalidating pack
> > Nov 30 18:31:51 raid1 /kernel: raid.p0.s6: fatal read I/O error
> > Nov 30 18:31:51 raid1 /kernel: vinum: raid.p0.s6 is crashed by force
> > Nov 30 18:31:52 raid1 /kernel: vinum: raid.p0 is degraded

> That looks like it may be a vinum issue.  You shouldn't be getting buffers
> done twice, as that error message indicates.  Have you talked to Greg at
> all about this?  If you're chasing down bugs in Vinum, it would make sense
> to contact the author and work with him to either find the problem, or
> trace it to some other part of the system.
> 
> > Nov 30 18:31:52 raid1 /kernel: (da8:ahc1:0:11:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
> > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): lost device
> > Nov 30 18:33:16 raid1 /kernel: (da8:ahc1:0:11:0): removing device entry
> >
> > ... I got more than one of the Synchronize cache failed.  the "lost
> > device" was when I "camcontrol rescan 1"  ... I did do a "camcontrol
> > reset 1", but it didn't affect things.
> 
> All of that is normal.  The synchronize cache failed since there was no
> device there to talk to.  You probably got more than one of those because
> it was retried.
> 
> > The net result is that SCSI bus 1 was wedged after this.  I would
> > conjecture that removing a device (and running with this device
> > removed is precisely what the chassy was designed to do) should not
> > wedge things.
> 
> How do you know the bus was wedged?  Could you issue SCSI commands with
> camcontrol?  e.g.:
> 
> camcontrol tur da10 -v
> 
> Will issue a test unit ready to da10.  If it responds, the bus isn't
> wedged.
> 
> > In fact, since the camcontrol rescan 1 was successful, I suggest that
> > it was cam, not the ahc driver that was somehow wedged.
> 
> I don't think it's clear at all what wedged.  The fact that you were able
> to rescan the bus indicates that the CAM side of things is probably working
> properly.  One of the things that a rescan does is send a SCSI inquiry
> command to every possible target ID on the bus.  You can't do that if the
> bus is wedged.

Doesn't this all mean and conclude that vinum is not yet 100%, or even
70%, supportive of RAID-5, AND Hot-Swap.  I thought vinum didn't support
hot-swap.

I've been tuning into this discussion, staying relatively silent as it
wooshes above my head, but anyway, feel free to ignore my comment. :)


Ciao,

Matador
matador@techie.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?384589FD.14CCA8BB>