Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Apr 2011 23:12:55 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        David Naylor <naylor.b.david@gmail.com>
Cc:        FreeBSD-Current <freebsd-current@freebsd.org>
Subject:   Re: [regression] unable to boot: no GEOM devices found.
Message-ID:  <4DA4B247.6010901@FreeBSD.org>
In-Reply-To: <201104122132.23809.naylor.b.david@gmail.com>
References:  <mailpost.1302585106.8448174.20731.mailing.freebsd.current@FreeBSD.cs.nctu.edu.tw> <4DA3EE8F.8050306@FreeBSD.org> <201104122132.23809.naylor.b.david@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
David Naylor wrote:
> On Tuesday 12 April 2011 08:17:51 Alexander Motin wrote:
>> David Naylor wrote:
>>> I am running -current and since a few days ago (at least 2011/04/11) I am
>>> unable to boot.
>>>
>>> The boot process stops when it looks to find a bootable device.  The
>>> prompt (when pressing '?') does not display any device and yielding one
>>> second (or more) to the kernel (by pressing '.') does not improve the
>>> situation.
>>>
>>> A known working date is 2011/02/20.
>>>
>>> I am running amd64 on a nVidia MCP51 chipset.
>> MCP51... again...
>>
>>> I am willing to help any way I can.
>> You could start from capturing and showing verbose dmesg. Full or at
>> least in parts related to disks.
> 
> I captured the dmesg output for both the old (working) kernel and the new 
> (bad) kernel.  See attached for the difference between the two.  If you need 
> the full dmesg please let me know.  
> 
> One thing I found is that the old kernel would not boot if I simply rebooted 
> from the bad kernel.  I had to do a hard power off before the old kernel would 
> work again.  Is some device state surviving between reboots?  

+ata2: reiniting channel ..
+ata2: SATA connect time=0ms status=00000113
+ata2: reset tp1 mask=01 ostat0=58 ostat1=00
+ata2: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
+ata2: reset tp2 stat0=50 stat1=00 devices=0x1
+ata2: reinit done ..
+unknown: FAILURE - ATA_IDENTIFY timed out LBA=0

As soon as all devices detected but not responding to commands, I would
suppose that there is something wrong with ATA interrupts. There is a
long chain of interrupt problems in this chipset. I have already tried
to debug one case where ATA wasn't generating interrupts at all.
Unfortunately, without success -- requests were executing, but not
generating interrupts, it wasn't looked like ATA driver problem.

What's about possible candidate to revision triggering your problem, I
would look on this message:
+pcib0: Enabling MSI window for HyperTransport slave at pci0:0:9:0

At least it is recent (SVN revs 219737,219740 on 2011-03-18 by jhb) and
it is interrupt related.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DA4B247.6010901>