Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 May 2011 14:48:51 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        David Naylor <naylor.b.david@gmail.com>
Cc:        Alexander Motin <mav@freebsd.org>, FreeBSD-Current <freebsd-current@freebsd.org>
Subject:   Re: [regression] unable to boot: no GEOM devices found.
Message-ID:  <201105091448.51961.jhb@freebsd.org>
In-Reply-To: <201105092024.41588.naylor.b.david@gmail.com>
References:  <mailpost.1302585106.8448174.20731.mailing.freebsd.current@FreeBSD.cs.nctu.edu.tw> <201104152329.59294.naylor.b.david@gmail.com> <201105092024.41588.naylor.b.david@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, May 09, 2011 2:24:37 pm David Naylor wrote:
> On Friday 15 April 2011 23:29:55 David Naylor wrote:
> > On Friday 15 April 2011 18:28:06 John Baldwin wrote:
> > > On Wednesday, April 13, 2011 1:07:06 pm David Naylor wrote:
> > > > On Tuesday 12 April 2011 22:12:55 Alexander Motin wrote:
> > > > > David Naylor wrote:
> > > > > > On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote:
> > > > > >> David Naylor wrote:
> > > > > >>> I am running -current and since a few days ago (at least
> > > > > >>> 2011/04/11) I am unable to boot.
> > > > > >>> 
> > > > > >>> The boot process stops when it looks to find a bootable device.
> > > > > >>> The prompt (when pressing '?') does not display any device and
> > > > > >>> yielding
> > > 
> > > one
> > > 
> > > > > >>> second (or more) to the kernel (by pressing '.') does not 
improve
> > > > > >>> the situation.
> > > > > >>> 
> > > > > >>> A known working date is 2011/02/20.
> > > > > >>> 
> > > > > >>> I am running amd64 on a nVidia MCP51 chipset.
> > > > > >> 
> > > > > >> MCP51... again...
> > > > > 
> > > > > +ata2: reiniting channel ..
> > > > > +ata2: SATA connect time=0ms status=00000113
> > > > > +ata2: reset tp1 mask=01 ostat0=58 ostat1=00
> > > > > +ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> > > > > +ata2: reset tp2 stat0=50 stat1=00 devices=0x1
> > > > > +ata2: reinit done ..
> > > > > +unknown: FAILURE - ATA_IDENTIFY timed out LBA=0
> > > > > 
> > > > > As soon as all devices detected but not responding to commands, I
> > > > > would suppose that there is something wrong with ATA interrupts.
> > > > > There is a long chain of interrupt problems in this chipset. I have
> > > > > already tried to debug one case where ATA wasn't generating
> > > > > interrupts at all. Unfortunately, without success -- requests were
> > > > > executing, but not generating interrupts, it wasn't looked like ATA
> > > > > driver problem.
> > > > > 
> > > > > What's about possible candidate to revision triggering your problem,
> > > > > I would look on this message:
> > > > > +pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0
> > > > > 
> > > > > At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb)
> > > > > and it is interrupt related.
> > > > 
> > > > I reverted those two revs and everything works again.
> > > 
> > > Hmm, can you provide a full boot verbose dmesg?  Alternatively, can you
> > > see if the device at pci0:0:9:0 is a PCI-PCI bridge?
> > 
> > I can provide a verbose dmesg if the following is not enough:
> > 
> > none17@pci0:0:9:0:      class=0x050000 card=0x50011458 chip=0x027010de
> > rev=0xa2 hdr=0x00
> >     vendor     = 'NVIDIA Corporation'
> >     device     = 'MCP51 Host Bridge'
> >     class      = memory
> >     subclass   = RAM
> > 
> > I see two PCI-PCI bridges at pci0:0:3:0 and pci0:0:16:0.  I've attached 
the
> > full `pciconf -lv` output.
> 
> FYI, this issue is still present on current (~24 hours old).  Reverting the  
> above mentioned revisions still fixes the problem.  

Yes, I'm still chewing on how best to fix this.  The problem is that for the 
most part we should enable the MSI mapping window everywhere, but for certain 
broken Nvidia chipsets it seems that doing so breaks INTx interrupts and we 
need to not enable it (and disable MSI globally) on those chipsets.  Linux has 
some grotty code to allow PCI devices to figure out which Host Bridge device 
on PCI bus 0 is the real host bridge for each HT slave and to selectively 
enable it in the host bridge when an MSI interrupt is first enabled.

They also have a quirk to disable MSI altogether on certain nvidia chipsets if 
the MSI mapping window is not enabled by the BIOS.  I attempted to implement 
the latter, but it broke perfectly good nvidia chipsets on older ppc-based 
Macs.  I think I want to just disable MSI entirely on busted chipsets like 
yours, but I need to come up with a good way to detect your chipset (and 
similar).

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201105091448.51961.jhb>