From owner-freebsd-questions Fri Jan 21 0: 7:31 2000 Delivered-To: freebsd-questions@freebsd.org Received: from yana.lemis.com (yana.lemis.com [192.109.197.140]) by hub.freebsd.org (Postfix) with ESMTP id 9386F15198; Fri, 21 Jan 2000 00:07:11 -0800 (PST) (envelope-from grog@mojave.worldwide.lemis.com) Received: from mojave.worldwide.lemis.com ([203.197.137.157]) by yana.lemis.com (8.8.8/8.8.8) with ESMTP id SAA06936; Fri, 21 Jan 2000 18:36:46 +1030 (CST) (envelope-from grog@mojave.worldwide.lemis.com) Received: (from grog@localhost) by mojave.worldwide.lemis.com (8.9.3/8.9.3) id NAA01775; Fri, 21 Jan 2000 13:34:35 +0530 (IST) (envelope-from grog) Date: Fri, 21 Jan 2000 13:34:35 +0530 From: Greg Lehey To: John Baldwin Cc: freebsd-questions@FreeBSD.org, cjclark@home.com Subject: Re: Recoverving/reviving a 'stale' subdisk under vinum Message-ID: <20000121133435.U1123@mojave.worldwide.lemis.com> Reply-To: Greg Lehey References: <20000121105518.N481@mojave.worldwide.lemis.com> <200001210635.BAA73206@server.baldwin.cx> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <200001210635.BAA73206@server.baldwin.cx>; from jhb@FreeBSD.org on Fri, Jan 21, 2000 at 01:35:33AM -0500 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-41-739-7062 Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Friday, 21 January 2000 at 1:35:33 -0500, John Baldwin wrote: > > On 21-Jan-00 Greg Lehey wrote: >> On Thursday, 20 January 2000 at 19:15:43 -0500, Crist J. Clark wrote: >>> On Thu, Jan 20, 2000 at 01:56:07PM -0500, John H. Baldwin wrote: >>>> I've read the vinum(4) and vinum(8) manpages as well as the webpages at >>>> www.lemis.com/~grog/vinum.html, and while they are very good as far as >>>> setup and configuration info, I haven't been able to find a lot of info >>>> about recovering. I have a stale subdisk that I can't get to recover no >>>> matter how many different start commands I try. I've tried starting the >>>> volume, the plex, and the subdisk itself with no success. >>>> >>>> # vinum list >>>> Configuration summary >>>> >>>> Drives: 3 (4 configured) >>>> Volumes: 1 (4 configured) >>>> Plexes: 1 (8 configured) >>>> Subdisks: 3 (16 configured) >>>> >>>> D vinumdrive0 State: up Device /dev/da1s1e Avail: 0/8683 MB (0%) >>>> D vinumdrive1 State: up Device /dev/da2s1e Avail: 0/8683 MB (0%) >>>> D vinumdrive2 State: up Device /dev/da3s1e Avail: 0/8683 MB (0%) >>>> >>>> V ftp_mirror State: up Plexes: 1 Size: 25 GB >>>> >>>> P ftp_mirror.p0 S State: corrupt Subdisks: 3 Size: 25 GB >>>> >>>> S ftp_mirror.p0.s0 State: up PO: 0 B Size: 8683 MB >>>> S ftp_mirror.p0.s1 State: up PO: 256 kB Size: 8683 MB >>>> S ftp_mirror.p0.s2 State: stale PO: 512 kB Size: 8683 MB >>>> >>>> # vinum start ftp_mirror.p0.s2 >>>> Can't start ftp_mirror.p0.s2: Device busy (16) >> >> Hmm. That shouldn't happen. > > Well, that's comforting. :) Hmm. Looking at this more carefully, yes, you can't do anything there. You just don't have the information to recover the subdisk. I'm still debating what to do in this case; there's no way to bring it back to a guaranteed consistent state here, but you *can* use the 'setupstate' command to fake it. >>> You have to 'stop' everything first. (I might be overkilling here, >>> but better safe...) >> >> No, that's not safe. That would mean taking down the volume. > > Err, oops. I already did this and it worked. I've already fsck'd > the volume and have it in use right now. > >> I haven't seen this before. How about the information I ask for in >> the web page? > > Ok, here's what I do have, but I did fix it using the above > hackishness, so some of it may not apply. > > the output of 'vinum list' you already have above, here's some of > vinum_history, although it doesn't include any of the return values, > so I don't think it will be of much use: > > 20 Jan 2000 12:39:55.489661 *** vinum started *** > 20 Jan 2000 12:39:55.540632 start > 20 Jan 2000 12:39:55.820518 *** Created devices *** > 20 Jan 2000 12:40:12.649217 *** vinum started *** > 20 Jan 2000 12:40:13.502406 help > 20 Jan 2000 12:40:25.188145 ls > 20 Jan 2000 13:10:31.321216 start > 20 Jan 2000 13:10:47.978917 start ftp_mirror.p0.s2 > 20 Jan 2000 13:10:50.980012 stop > > That is what I did when I first brought the machine back up. > > 20 Jan 2000 16:21:53.536302 *** vinum started *** > 20 Jan 2000 16:21:53.537010 stop ftp_mirror.p0 > 20 Jan 2000 16:21:58.984393 *** vinum started *** Hmm. Interesting. I don't seem to log a 'vinum stop'. > 20 Jan 2000 16:21:58.985133 list > 20 Jan 2000 16:22:06.561902 *** vinum started *** > 20 Jan 2000 16:22:06.562622 stop ftp_mirror.p0.s2 > 20 Jan 2000 16:22:17.000952 *** vinum started *** > 20 Jan 2000 16:22:17.005242 stop -f ftp_mirror.p0.s2 > 20 Jan 2000 16:22:21.145993 *** vinum started *** > 20 Jan 2000 16:22:21.146744 list > 20 Jan 2000 16:22:40.709634 *** vinum started *** > 20 Jan 2000 16:22:40.710394 start ftp_mirror > 20 Jan 2000 16:22:54.393075 *** vinum started *** > 20 Jan 2000 16:22:54.393778 start ftp_mirror.p0.s0 > 20 Jan 2000 16:23:00.238272 *** vinum started *** > 20 Jan 2000 16:23:00.239015 list > 20 Jan 2000 16:23:09.552251 *** vinum started *** > 20 Jan 2000 16:23:09.552963 start ftp_mirror.p0.s1 > 20 Jan 2000 16:23:16.193159 *** vinum started *** > 20 Jan 2000 16:23:16.193896 start ftp_mirror.p0.s2 > > That is how I "fixed" it. I don't see the volume being stopped there. Of course, it's not so important not to stop a volume if it's only partially accessible. > However, the drive seems to have fallen over again (*sigh*) with the > following kernel messages: > > Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): SCB 0x96 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0xa > Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): Queuing a BDR SCB > Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): Bus Device Reset Message Sent > Jan 20 23:28:38 raven /kernel: (da2:ahc1:0:1:0): no longer in timeout, status = 34b > Jan 20 23:28:38 raven /kernel: ahc1: Bus Device Reset on A:1. 1 SCBs aborted Yup, that looks like a hardware problem; possibly bus termination or some such. Vinum is good at finding suboptimal SCSI chains, since it issues multiple requests in parallel. > Note that I didn't get this message until after the drive had been > booted for a while, Right, that's relatively typical. > the kernel found it fine during boot: Greg -- When replying to this message, please copy the original recipients. For more information, see http://www.lemis.com/questions.html Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message