Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jan 2000 22:17:18 +0530
From:      Greg Lehey <grog@mojave.worldwide.lemis.com>
To:        cjclark@home.com
Cc:        John Baldwin <jhb@FreeBSD.org>, freebsd-questions@FreeBSD.org
Subject:   Re: Recoverving/reviving a 'stale' subdisk under vinum
Message-ID:  <20000121221718.C918@mojave.worldwide.lemis.com>
In-Reply-To: <20000121083402.A76063@cc942873-a.ewndsr1.nj.home.com>; from cjc@cc942873-a.ewndsr1.nj.home.com on Fri, Jan 21, 2000 at 08:34:02AM -0500
References:  <20000121105518.N481@mojave.worldwide.lemis.com> <200001210635.BAA73206@server.baldwin.cx> <20000121133435.U1123@mojave.worldwide.lemis.com> <20000121083402.A76063@cc942873-a.ewndsr1.nj.home.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, 21 January 2000 at  8:34:02 -0500, Crist J. Clark wrote:
> On Fri, Jan 21, 2000 at 01:34:35PM +0530, Greg Lehey wrote:
>> On Friday, 21 January 2000 at  1:35:33 -0500, John Baldwin wrote:
>>>
>>> On 21-Jan-00 Greg Lehey wrote:
>>>> On Thursday, 20 January 2000 at 19:15:43 -0500, Crist J. Clark wrote:
>>>>> On Thu, Jan 20, 2000 at 01:56:07PM -0500, John H. Baldwin wrote:
>>>>>> I've read the vinum(4) and vinum(8) manpages as well as the webpages at
>>>>>> www.lemis.com/~grog/vinum.html, and while they are very good as far as
>>>>>> setup and configuration info, I haven't been able to find a lot of info
>>>>>> about recovering.  I have a stale subdisk that I can't get to recover no
>>>>>> matter how many different start commands I try.  I've tried starting the
>>>>>> volume, the plex, and the subdisk itself with no success.
>>>>>>
>>>>>> # vinum list
>>>>>> Configuration summary
>>>>>>
>>>>>> Drives:         3 (4 configured)
>>>>>> Volumes:        1 (4 configured)
>>>>>> Plexes:         1 (8 configured)
>>>>>> Subdisks:       3 (16 configured)
>>>>>>
>>>>>> D vinumdrive0           State: up       Device /dev/da1s1e      Avail: 0/8683 MB (0%)
>>>>>> D vinumdrive1           State: up       Device /dev/da2s1e      Avail: 0/8683 MB (0%)
>>>>>> D vinumdrive2           State: up       Device /dev/da3s1e      Avail: 0/8683 MB (0%)
>>>>>>
>>>>>> V ftp_mirror            State: up       Plexes:       1 Size:         25 GB
>>>>>>
>>>>>> P ftp_mirror.p0       S State: corrupt  Subdisks:     3 Size:         25 GB
>>>>>>
>>>>>> S ftp_mirror.p0.s0      State: up       PO:        0  B Size:       8683 MB
>>>>>> S ftp_mirror.p0.s1      State: up       PO:      256 kB Size:       8683 MB
>>>>>> S ftp_mirror.p0.s2      State: stale    PO:      512 kB Size:       8683 MB
>>>>>>
>>>>>> # vinum start ftp_mirror.p0.s2
>>>>>> Can't start ftp_mirror.p0.s2: Device busy (16)
>>>>
>>>> Hmm.  That shouldn't happen.
>>>
>>> Well, that's comforting. :)
>>
>> Hmm.  Looking at this more carefully, yes, you can't do anything
>> there.  You just don't have the information to recover the subdisk.
>> I'm still debating what to do in this case; there's no way to bring it
>> back to a guaranteed consistent state here, but you *can* use the
>> 'setupstate' command to fake it.
>
> When I was having troubles with an iffy SCSI HDD a week or two or go,
> this is _exactly_ what would happen to me too, the "Device busy (16)"
> message. The only thing I found to fix it was a forced stop, and it
> seemed to always work. Sorry if it is not the idel way to go, but it
> is what worked fine for me.

Hmm.  I suppose this is worth investigating.  It's quite possible that
the message is incorrect and should say something like "device not
accessible".

  True story: About 17 years ago, I was working for Tandem, and we had
  sporadic reports of customers unable to revive disk mirrors.  The
  error reported was 12 (FEINUSE, file in use), which looks pretty
  much like the thing we have here.  The first report was from
  Helsinki, the second was from Taranto in the South of Italy, and in
  each case the customer engineer was able to hide the symptoms before
  I could find the problem.

  The third time it happened in Bern, the capital of Switzerland.  I
  told the CE to do nothing, and I would be there immediately.  I
  jumped in my car, was in Basel by 7 pm, and we spent an hour or so
  debugging the disk driver.

  The reason?  It checked a flag at the beginning of the disk, which
  specified what kind of format it had, and found nothing it
  recognized, so it decided it must belong to an ancient, no longer
  used disk controller, and refused to touch it ("it belongs to
  somebody else").  In fact, the check was incorrect: if the very
  first sector of the disk had been spared, it had a different flag,
  but it didn't check for this eventuality.  A hard format got rid of
  the spare, and people were able to revive again.

>>>> You have to 'stop' everything first. (I might be overkilling here,
>>>>> but better safe...)
>>>>
>>>> No, that's not safe.  That would mean taking down the volume.
>
> I my case it was a striped setup so once one subdisk was down, the
> whole plex was useless. There was no reason not to stop everything.

Yes, in fact this was the case here as well.

> [snip]
>>>> I haven't seen this before.  How about the information I ask for in
>>>> the web page?
>
> I have abundant /var/log/message info from my problems. Need more
> data?

Hold on to it, but don't send it to me yet.  I'm way away from home,
and I won't be able to look at it for at least a week.

>>> Note that I didn't get this message until after the drive had been
>>> booted for a while,
>>
>> Right, that's relatively typical.
>
> Yup, that's the general type of error I was getting. I finally
> narrowed it down to one of the drives after swapping SCSI cards,
> changing all of the external cabling, swapping terminators, and
> disassembling and reassembling the two shoeboxes the drives live
> in. SCSI can be a real pain sometimes.

SCSI is not a mystery.  There are serious technical reasons why it is
occasionally necessary to sacrifice a live goat to a SCSI chain.

Greg
--
When replying to this message, please copy the original recipients.
For more information, see http://www.lemis.com/questions.html
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000121221718.C918>