Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 02 Jun 2011 11:37:33 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        "stable@freebsd.org" <stable@freebsd.org>
Subject:   Re: 8-STABLE won't boot with ZFSv28
Message-ID:  <4DE74BCD.8080002@FreeBSD.org>
In-Reply-To: <20110602075118.GA42026@icarus.home.lan>
References:  <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88C69@msx3.exchange.alogis.com> <20110601085454.GA19434@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88DC0@msx3.exchange.alogis.com> <20110601095610.GA20255@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88F48@msx3.exchange.alogis.com> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD890BD@msx3.exchange.alogis.com> <4DE73386.5040505@FreeBSD.org> <20110602075118.GA42026@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy Chadwick wrote:
> On Thu, Jun 02, 2011 at 09:53:58AM +0300, Alexander Motin wrote:
>> Holger Kipp wrote:
>>> got the same messages over and over again - panic took some time:
>>>
>>> unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0
>>> ata0: reinit done ..
>>> ata0: reiniting channel ..
>>> ata0: DISCONNECT requested
>>>
>>> <short delay here>
>>>
>>> ata0: p0: SATA connect time=0ms status=00000113
>>> ata0: p1: SATA connect timeout status=00000000
>>> ata0: reset tp1 mask=03 ostat0=00 ostat1=00
>>> ata0: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
>>> ata0: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb
>>> ata0: reset tp2 stat0=00 stat1=00 devices=0x30000
>>> unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0
>>> ata0: reinit done ..
>>> ata0: reiniting channel ..
>>> ata0: DISCONNECT requested
>> I see two problems here:
>>  1. "devices=0x30000" means that two ATAPI devices were detected instead
>> of one. I can reproduce it also with other Intel chipsets. It looks like
>> a hardware bug to me. It can be workarounded by reconnecting ATAPI
>> device to even (2 or 4) SATA port, or connecting any other device there.
>>  2. "DISCONNECT requested" means that controller reported PHY status
>> change for some device on channel, triggering infinite retry. Unluckily
>> I have no ICH9 board, while I can't reproduce it with ICH10 or above.
>>
>> This patch should workaround the first problem in software:
>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ata-intel.c.diff?r1=1.25;r2=1.26
>> Try it please and let's see if with some luck it do something about the
>> second problem.
> 
> With regards to item #1: I don't see anything in the ICH9 errata that
> indicates a silicon bug if the only device attached to the controller is
> an ATAPI device and connected to SATA port 0 (presumably), or an
> odd-numbered port?  If this problem exists on other ICHxx and/or ESBxx
> chips, I sure would hope it'd be documented.
> 
> I haven't tried confirming it myself, but if need be I can set up a test
> box with a SATA-based DVD drive hooked up to it + provide remote serial
> console/etc. if it'd be of any help.  I don't think it would be (sounds
> like you have lots of hardware :-) ), but I'm willing to help in any way
> I can.

Intel probably don't see issue there, as the same behavior can be found
even on latest chipsets. But according to my ATA specs understanding and
real PATA devices behavior analysis, this behavior is not correct. When
ATAPI device connected to the first of two SATA ports, routed to the
same legacy-/PATA-emulated ATA channel (master device), soft-reset
sequence returns false-positive slave ATAPI device presence. Problem
doesn't expose with ATA disk devices, or if some other device really
attached to the slave port. Problem looks like it was there always, but
before ATA_CAM it was not usually noticed, due to very small IDENTIFY
command timeouts in ata(4).

If somebody can give better explanation or propose better workaround --
welcome, as I am not very like this solution.

> With regards to item #2: could this be at all related to OOB (bit 15)
> somehow being set in PCS (SATA register offset 0x92)?  I'm doubting it
> but I thought I'd ask.  My thought process, which is probably wrong
> (consider it an educational discussion :-) ):
> 
> The ICH9 specification states that the default value for this register
> is 0x0000, and b15=0 means "SATA controller will not retry after an OOB
> failure", while b15=1 causes the controller to indefinitely retry after
> OOB failure.  I imagine system BIOSes and other things can change this
> default value, but we don't seem to print it anywhere in
> ata_intel_chipinit() during a verbose boot.
> 
> Looking at chipsets/ata-intel.c, it looks like we only touch PCS in
> ata_intel_chipinit() and ata_intel_reset().  In the former, we avoid
> touching bits 4 through 15, and in the latter we mask out only what we
> want to adjust (e.g. the SATA port per ch variable).

As as I can see, ata_intel.c should not change that bit if it was set
for some reason. Theoretically, OOB (Out-of-Band signaling) is the
function of the same state machine which sets that PHY changes status
flag. But friendly speaking, I have no idea what result can be from
setting of this bit. In this legacy/PATA emulation mode there are too
many things not documented to be sure in anything.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DE74BCD.8080002>