Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Mar 2005 18:57:35 -0800 (PST)
From:      Doug White <dwhite@gumbysoft.com>
To:        Graham Menhennitt <gmenhennitt@optusnet.com.au>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: "ffs_mountroot: can't find rootvp" after cvsup and making worldfmen
Message-ID:  <20050329184539.C58510@carver.gumbysoft.com>
In-Reply-To: <4247AFDB.1060307@optusnet.com.au>
References:  <42436771.3060006@optusnet.com.au> <20050325133558.U16071@carver.gumbysoft.com> <20050327130409.F35584@carver.gumbysoft.com> <4247AFDB.1060307@optusnet.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 28 Mar 2005, Graham Menhennitt wrote:

> I compared the output of "boot -v" for the working and broken kernels.
> It seems that the broken one does fewer loops around the disk probe and
> hence has less lines of
>     ata0-master: stat=0x90 err=0x90 lsb=0x90 msb=0x90

You know, that looks like 0xd0 with some masking...

> than the one that works. Since that line comes from ata-lowlevel.c, I
> cvs'ed versions of that file going back to around when I built the
> working kernel. The following seems to be the change that broke it.

This is the delta to rev 1.51 of src/sys/dev/ata/ata-lowlevel.c.  That
adds a condition that is supposed to detect an empty channel. Now why your
controller sasys the channel is empty and somehow becomes un-empty later
is a good question.

>
>  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> --- ata-lowlevel.c    Mon Mar 28 15:59:57 2005
> +++ ata-lowlevel.c_orig    Wed Mar 23 19:17:46 2005
> @@ -605,19 +605,26 @@
>          }
>      }
>      if (mask == 0x01)    /* wait for master only */
> -        if (!(stat0 & ATA_S_BUSY) || (stat0 == 0xff && timeout > 5))
> +        if (!(stat0 & ATA_S_BUSY) || (stat0 == 0xff && timeout > 5) ||
> +        (stat0 == err && lsb == err && msb == err && timeout > 5))
>          break;
>      if (mask == 0x02)    /* wait for slave only */
> -        if (!(stat1 & ATA_S_BUSY) || (stat1 == 0xff && timeout > 5))
> +        if (!(stat1 & ATA_S_BUSY) || (stat1 == 0xff && timeout > 5) ||
> +        (stat1 == err && lsb == err && msb == err && timeout > 5))
>          break;
>      if (mask == 0x03) {    /* wait for both master & slave */
>          if (!(stat0 & ATA_S_BUSY) && !(stat1 & ATA_S_BUSY))
>          break;
> -        if (stat0 == 0xff && timeout > 5)
> +        if ((stat0 == 0xff && timeout > 5) ||
> +        (stat0 == err && lsb == err && msb == err && timeout > 5))
>          mask &= ~0x01;
> -        if (stat1 == 0xff && timeout > 5)
> +        if ((stat1 == 0xff && timeout > 5) ||
> +        (stat1 == err && lsb == err && msb == err && timeout > 5))
>          mask &= ~0x02;
>      }
> +    if (mask == 0 && !(stat0 & ATA_S_BUSY) && !(stat1 & ATA_S_BUSY))
> +        break;
> +
>      ata_udelay(100000);
>      }
>  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
> Anyway, I now have a working kernel. I presume that I should file a PR
> on this.

Yes please.

Do you have a long delay at the point where the bogus messages are printed
in the newer kernel, but in the older? The change implies that it will get
out of a busted channel faster, but your disk apparently needs a longer
delay.  If its hanging for the full 30s on the working kernel then that
woud explain why shortening the dealy ends up with a missing disk.

If you want to try another workaround, increase the ata_udelay(100000); by
2, and progressively longer until your disk reappears. (You may want to
reduce the for exit condition on timeout since it'll wait 310 iterations.)
If that doesn't work, start increasing the DELAY()s.

You might also check for a drive firmware update.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite@gumbysoft.com          |  www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050329184539.C58510>