Date: Fri, 03 Jun 2011 12:51:01 +0200 From: Holger Kipp <holger.kipp@alogis.com> To: Alexander Motin <mav@freebsd.org> Cc: "stable@freebsd.org" <stable@freebsd.org>, Jeremy Chadwick <freebsd@jdc.parodius.com> Subject: Re: 8-STABLE won't boot with ZFSv28 Message-ID: <4DE8BC95.2030008@alogis.com> In-Reply-To: <4DE74BCD.8080002@FreeBSD.org> References: <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88C69@msx3.exchange.alogis.com> <20110601085454.GA19434@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88DC0@msx3.exchange.alogis.com> <20110601095610.GA20255@icarus.home.lan> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD88F48@msx3.exchange.alogis.com> <814C9E9472FDCC40AAC3FC95A2D67E3B0BD890BD@msx3.exchange.alogis.com> <4DE73386.5040505@FreeBSD.org> <20110602075118.GA42026@icarus.home.lan> <4DE74BCD.8080002@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi all, as yesterday was a bank holiday in Germany I wasn't in the office to try the patch linked in the email. Is it consent that I should try the patch located here: >>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ata-intel.c.diff?r1=1.25;r2=1.26 and report the result? Or do you need some additional discussion on this topic? I really don't know much about ata-intel chipset programming interface things, that's why I'm asking :-) Best regards, Holger on 02.06.2011 10:37, Alexander Motin wrote: > Jeremy Chadwick wrote: >> On Thu, Jun 02, 2011 at 09:53:58AM +0300, Alexander Motin wrote: >>> Holger Kipp wrote: >>>> got the same messages over and over again - panic took some time: >>>> >>>> unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0 >>>> ata0: reinit done .. >>>> ata0: reiniting channel .. >>>> ata0: DISCONNECT requested >>>> >>>> <short delay here> >>>> >>>> ata0: p0: SATA connect time=0ms status=00000113 >>>> ata0: p1: SATA connect timeout status=00000000 >>>> ata0: reset tp1 mask=03 ostat0=00 ostat1=00 >>>> ata0: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb >>>> ata0: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb >>>> ata0: reset tp2 stat0=00 stat1=00 devices=0x30000 >>>> unknown: WARNING - ATAPI_IDENTIFY requeued due to channel reset LBA=0 >>>> ata0: reinit done .. >>>> ata0: reiniting channel .. >>>> ata0: DISCONNECT requested >>> I see two problems here: >>> 1. "devices=0x30000" means that two ATAPI devices were detected instead >>> of one. I can reproduce it also with other Intel chipsets. It looks like >>> a hardware bug to me. It can be workarounded by reconnecting ATAPI >>> device to even (2 or 4) SATA port, or connecting any other device there. >>> 2. "DISCONNECT requested" means that controller reported PHY status >>> change for some device on channel, triggering infinite retry. Unluckily >>> I have no ICH9 board, while I can't reproduce it with ICH10 or above. >>> >>> This patch should workaround the first problem in software: >>> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/ata/chipsets/ata-intel.c.diff?r1=1.25;r2=1.26 >>> Try it please and let's see if with some luck it do something about the >>> second problem. >> >> With regards to item #1: I don't see anything in the ICH9 errata that >> indicates a silicon bug if the only device attached to the controller is >> an ATAPI device and connected to SATA port 0 (presumably), or an >> odd-numbered port? If this problem exists on other ICHxx and/or ESBxx >> chips, I sure would hope it'd be documented. >> >> I haven't tried confirming it myself, but if need be I can set up a test >> box with a SATA-based DVD drive hooked up to it + provide remote serial >> console/etc. if it'd be of any help. I don't think it would be (sounds >> like you have lots of hardware :-) ), but I'm willing to help in any way >> I can. > > Intel probably don't see issue there, as the same behavior can be found > even on latest chipsets. But according to my ATA specs understanding and > real PATA devices behavior analysis, this behavior is not correct. When > ATAPI device connected to the first of two SATA ports, routed to the > same legacy-/PATA-emulated ATA channel (master device), soft-reset > sequence returns false-positive slave ATAPI device presence. Problem > doesn't expose with ATA disk devices, or if some other device really > attached to the slave port. Problem looks like it was there always, but > before ATA_CAM it was not usually noticed, due to very small IDENTIFY > command timeouts in ata(4). > > If somebody can give better explanation or propose better workaround -- > welcome, as I am not very like this solution. > >> With regards to item #2: could this be at all related to OOB (bit 15) >> somehow being set in PCS (SATA register offset 0x92)? I'm doubting it >> but I thought I'd ask. My thought process, which is probably wrong >> (consider it an educational discussion :-) ): >> >> The ICH9 specification states that the default value for this register >> is 0x0000, and b15=0 means "SATA controller will not retry after an OOB >> failure", while b15=1 causes the controller to indefinitely retry after >> OOB failure. I imagine system BIOSes and other things can change this >> default value, but we don't seem to print it anywhere in >> ata_intel_chipinit() during a verbose boot. >> >> Looking at chipsets/ata-intel.c, it looks like we only touch PCS in >> ata_intel_chipinit() and ata_intel_reset(). In the former, we avoid >> touching bits 4 through 15, and in the latter we mask out only what we >> want to adjust (e.g. the SATA port per ch variable). > > As as I can see, ata_intel.c should not change that bit if it was set > for some reason. Theoretically, OOB (Out-of-Band signaling) is the > function of the same state machine which sets that PHY changes status > flag. But friendly speaking, I have no idea what result can be from > setting of this bit. In this legacy/PATA emulation mode there are too > many things not documented to be sure in anything. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DE8BC95.2030008>