From owner-freebsd-current@FreeBSD.ORG  Mon Mar  2 19:44:02 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 014001065C5B
	for <freebsd-current@freebsd.org>; Mon,  2 Mar 2009 19:44:01 +0000 (UTC)
	(envelope-from mav@mavhome.dp.ua)
Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 319CD8FC13
	for <freebsd-current@freebsd.org>; Mon,  2 Mar 2009 19:44:00 +0000 (UTC)
	(envelope-from mav@mavhome.dp.ua)
X-Spam-Flag: SKIP
X-Spam-Yversion: Spamooborona-2.1.0
Received: from [212.86.226.226] (account mav@alkar.net HELO
	mavbook.mavhome.dp.ua)
	by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9)
	with ESMTPSA id 236339470; Mon, 02 Mar 2009 21:43:59 +0200
Message-ID: <49AC36FF.2080509@mavhome.dp.ua>
Date: Mon, 02 Mar 2009 21:43:59 +0200
From: Alexander Motin <mav@mavhome.dp.ua>
User-Agent: Thunderbird 2.0.0.19 (X11/20090118)
MIME-Version: 1.0
To: Elliot Schlegelmilch <elliot@schlegelmilch.org>
References: <go44ht$2i6a$1@FreeBSD.cs.nctu.edu.tw>
	<1235602472.00079680.1235592003@10.7.7.3>
	<1235658185.00079898.1235647801@10.7.7.3>
	<1235863381.00080963.1235851802@10.7.7.3>
	<49AAB0A6.3040304@mavhome.dp.ua>
	<20090302190759.GA95194@schlegelmilch.org>
In-Reply-To: <20090302190759.GA95194@schlegelmilch.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Mailman-Approved-At: Mon, 02 Mar 2009 19:56:24 +0000
Cc: FreeBSD-Current <freebsd-current@freebsd.org>
Subject: Re: SATA disks suddenly stop working
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Mar 2009 19:44:04 -0000

Elliot Schlegelmilch wrote:
> Alexander Motin wrote:
> 
> [snip]
> 
>>> ata2: <ATA channel 0> on atapci1
>>> ata2: AHCI reset...: 2
>>> ata2: SATA connect time=0ms
>>> ata2: ready wait time=0ms52 (12272 MB)
>>> ata2: software reset port 15...
>>> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
>>> ata2: software reset set timeout
>>> ata2: software reset port 0...
>>> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001
>>> ata2: software reset set timeout
>>> ata2: SIGNATURE: ffffffff
>>> ata2: Unknown signature, assuming disk device
>>> ata2: AHCI reset done: devices=00000001
>>> ata2: [MPSAFE]
>>> ata2: [ITHREAD]
>>>
>>> One for each channel, up to ata7. 
>> Does it happen during boot or what do you mean by unable to reattach 
>> drive now?
> 
> Yes, I saw the above during boot. 
> 
> What I mean by unable to reattach is describing the old behavior:
> Sometimes my ad12 would fall off the bus, and I could usually retrieve
> it by 'atacontrol detach ata6; atacontrol attach ata6;'
> Now it's: ata6: still BUSY after softreset
> and attempting the detach/attach results in:
> 
> Tracing pid 12 tid 100007 td 0xffffff0001afb390
> device_get_parent() at device_get_parent+0x1
> ata_start() at ata_start+0x1c5
> ata_reinit() at ata_reinit+0x1dd
> ata_completed() at ata_completed+0x75
> softclock() at softclock+0x291
> intr_event_execute_handlers() at intr_event_execute_handlers+0x68
> ithread_loop() at ithread_loop+0xb2
> fork_exit() at fork_exit+0x12a
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xfffffffe4004ad40, rbp = 0 ---
> 
> This isn't a huge deal, and is probably a red herring, as I suspect
> the disk is going bad at this point. This is running Feb 1 kernel, as
> I recall. However, it can and has stayed attached for weeks at a time
> before.

If it happens with kernel of Feb 1, then probably drive just becomes 
worse. It's not me. :)

>>> atapci0@pci0:0:31:1:    class=0x01018a card=0x948115d9 chip=0x269e8086 
>>> rev=0x09 hdr=0x00
>>>     vendor     = 'Intel Corporation'
>>>     device     = '631xESB/632xESB/3100 Ultra ATA Storage Controller'
>>>     class      = mass storage
>>>     subclass   = ATA
>>>
>>> The last known kernel which works was Dec 17, but trying to rebuild a
>>> kernel from that date doesn't see the SATA disks either (as the kernel
>>> which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing
>>> incorrectly.
>> Haven't you tried to just touched reset sequence on 15.
> 
> Do you mean a kernel on Feb 15? Was there more that happened between
> 15th and the 22nd or so?

You have made something unreadable cutting and gluing two my sentences. 
:) Yes, there many happens between Feb 15 and 22.

>> When you succeed to boot, can you try to make some experiments against 
>> HEAD, may be some of them fix the problem:
>> 1) comment that line inside ata_ahci_issue_cmd():
>>     ATA_OUTL(ctlr->r_res2, ATA_AHCI_P_FBS + offset, (port << 8) | 
>> 0x00000001);
>>
>> 2) comment these lines inside ata_sata_phy_reset():
>>     if ((ATA_IDX_INL(ch, ATA_SCONTROL) & ATA_SC_DET_MASK) == 
>> ATA_SC_DET_IDLE)
>>     return ata_sata_connect(ch);
>>
>> 3) comment first that line inside ata_ahci_softreset():
>>     return (-1);
>>
>> Thanks.
>>
> 
> I'll try these patches and report back right after I freshen up my backups. :)

First one is already committed to the HEAD. It may be related to your 
problem.

-- 
Alexander Motin