From owner-freebsd-stable@FreeBSD.ORG Wed Mar 30 02:57:35 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AA6C516A4CE for ; Wed, 30 Mar 2005 02:57:35 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5F24843D3F for ; Wed, 30 Mar 2005 02:57:35 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 50B7572DDD; Tue, 29 Mar 2005 18:57:35 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 4C07F72DDB; Tue, 29 Mar 2005 18:57:35 -0800 (PST) Date: Tue, 29 Mar 2005 18:57:35 -0800 (PST) From: Doug White To: Graham Menhennitt In-Reply-To: <4247AFDB.1060307@optusnet.com.au> Message-ID: <20050329184539.C58510@carver.gumbysoft.com> References: <42436771.3060006@optusnet.com.au> <20050325133558.U16071@carver.gumbysoft.com> <20050327130409.F35584@carver.gumbysoft.com> <4247AFDB.1060307@optusnet.com.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: "ffs_mountroot: can't find rootvp" after cvsup and making worldfmen X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Mar 2005 02:57:35 -0000 On Mon, 28 Mar 2005, Graham Menhennitt wrote: > I compared the output of "boot -v" for the working and broken kernels. > It seems that the broken one does fewer loops around the disk probe and > hence has less lines of > ata0-master: stat=0x90 err=0x90 lsb=0x90 msb=0x90 You know, that looks like 0xd0 with some masking... > than the one that works. Since that line comes from ata-lowlevel.c, I > cvs'ed versions of that file going back to around when I built the > working kernel. The following seems to be the change that broke it. This is the delta to rev 1.51 of src/sys/dev/ata/ata-lowlevel.c. That adds a condition that is supposed to detect an empty channel. Now why your controller sasys the channel is empty and somehow becomes un-empty later is a good question. > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > --- ata-lowlevel.c Mon Mar 28 15:59:57 2005 > +++ ata-lowlevel.c_orig Wed Mar 23 19:17:46 2005 > @@ -605,19 +605,26 @@ > } > } > if (mask == 0x01) /* wait for master only */ > - if (!(stat0 & ATA_S_BUSY) || (stat0 == 0xff && timeout > 5)) > + if (!(stat0 & ATA_S_BUSY) || (stat0 == 0xff && timeout > 5) || > + (stat0 == err && lsb == err && msb == err && timeout > 5)) > break; > if (mask == 0x02) /* wait for slave only */ > - if (!(stat1 & ATA_S_BUSY) || (stat1 == 0xff && timeout > 5)) > + if (!(stat1 & ATA_S_BUSY) || (stat1 == 0xff && timeout > 5) || > + (stat1 == err && lsb == err && msb == err && timeout > 5)) > break; > if (mask == 0x03) { /* wait for both master & slave */ > if (!(stat0 & ATA_S_BUSY) && !(stat1 & ATA_S_BUSY)) > break; > - if (stat0 == 0xff && timeout > 5) > + if ((stat0 == 0xff && timeout > 5) || > + (stat0 == err && lsb == err && msb == err && timeout > 5)) > mask &= ~0x01; > - if (stat1 == 0xff && timeout > 5) > + if ((stat1 == 0xff && timeout > 5) || > + (stat1 == err && lsb == err && msb == err && timeout > 5)) > mask &= ~0x02; > } > + if (mask == 0 && !(stat0 & ATA_S_BUSY) && !(stat1 & ATA_S_BUSY)) > + break; > + > ata_udelay(100000); > } > <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< > > Anyway, I now have a working kernel. I presume that I should file a PR > on this. Yes please. Do you have a long delay at the point where the bogus messages are printed in the newer kernel, but in the older? The change implies that it will get out of a busted channel faster, but your disk apparently needs a longer delay. If its hanging for the full 30s on the working kernel then that woud explain why shortening the dealy ends up with a missing disk. If you want to try another workaround, increase the ata_udelay(100000); by 2, and progressively longer until your disk reappears. (You may want to reduce the for exit condition on timeout since it'll wait 310 iterations.) If that doesn't work, start increasing the DELAY()s. You might also check for a drive firmware update. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org