From owner-freebsd-current@FreeBSD.ORG Mon Mar 2 19:44:02 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 014001065C5B for ; Mon, 2 Mar 2009 19:44:01 +0000 (UTC) (envelope-from mav@mavhome.dp.ua) Received: from cmail.optima.ua (cmail.optima.ua [195.248.191.121]) by mx1.freebsd.org (Postfix) with ESMTP id 319CD8FC13 for ; Mon, 2 Mar 2009 19:44:00 +0000 (UTC) (envelope-from mav@mavhome.dp.ua) X-Spam-Flag: SKIP X-Spam-Yversion: Spamooborona-2.1.0 Received: from [212.86.226.226] (account mav@alkar.net HELO mavbook.mavhome.dp.ua) by cmail.optima.ua (CommuniGate Pro SMTP 5.2.9) with ESMTPSA id 236339470; Mon, 02 Mar 2009 21:43:59 +0200 Message-ID: <49AC36FF.2080509@mavhome.dp.ua> Date: Mon, 02 Mar 2009 21:43:59 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.19 (X11/20090118) MIME-Version: 1.0 To: Elliot Schlegelmilch References: <1235602472.00079680.1235592003@10.7.7.3> <1235658185.00079898.1235647801@10.7.7.3> <1235863381.00080963.1235851802@10.7.7.3> <49AAB0A6.3040304@mavhome.dp.ua> <20090302190759.GA95194@schlegelmilch.org> In-Reply-To: <20090302190759.GA95194@schlegelmilch.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 02 Mar 2009 19:56:24 +0000 Cc: FreeBSD-Current Subject: Re: SATA disks suddenly stop working X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Mar 2009 19:44:04 -0000 Elliot Schlegelmilch wrote: > Alexander Motin wrote: > > [snip] > >>> ata2: on atapci1 >>> ata2: AHCI reset...: 2 >>> ata2: SATA connect time=0ms >>> ata2: ready wait time=0ms52 (12272 MB) >>> ata2: software reset port 15... >>> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001 >>> ata2: software reset set timeout >>> ata2: software reset port 0... >>> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001 >>> ata2: software reset set timeout >>> ata2: SIGNATURE: ffffffff >>> ata2: Unknown signature, assuming disk device >>> ata2: AHCI reset done: devices=00000001 >>> ata2: [MPSAFE] >>> ata2: [ITHREAD] >>> >>> One for each channel, up to ata7. >> Does it happen during boot or what do you mean by unable to reattach >> drive now? > > Yes, I saw the above during boot. > > What I mean by unable to reattach is describing the old behavior: > Sometimes my ad12 would fall off the bus, and I could usually retrieve > it by 'atacontrol detach ata6; atacontrol attach ata6;' > Now it's: ata6: still BUSY after softreset > and attempting the detach/attach results in: > > Tracing pid 12 tid 100007 td 0xffffff0001afb390 > device_get_parent() at device_get_parent+0x1 > ata_start() at ata_start+0x1c5 > ata_reinit() at ata_reinit+0x1dd > ata_completed() at ata_completed+0x75 > softclock() at softclock+0x291 > intr_event_execute_handlers() at intr_event_execute_handlers+0x68 > ithread_loop() at ithread_loop+0xb2 > fork_exit() at fork_exit+0x12a > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xfffffffe4004ad40, rbp = 0 --- > > This isn't a huge deal, and is probably a red herring, as I suspect > the disk is going bad at this point. This is running Feb 1 kernel, as > I recall. However, it can and has stayed attached for weeks at a time > before. If it happens with kernel of Feb 1, then probably drive just becomes worse. It's not me. :) >>> atapci0@pci0:0:31:1: class=0x01018a card=0x948115d9 chip=0x269e8086 >>> rev=0x09 hdr=0x00 >>> vendor = 'Intel Corporation' >>> device = '631xESB/632xESB/3100 Ultra ATA Storage Controller' >>> class = mass storage >>> subclass = ATA >>> >>> The last known kernel which works was Dec 17, but trying to rebuild a >>> kernel from that date doesn't see the SATA disks either (as the kernel >>> which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing >>> incorrectly. >> Haven't you tried to just touched reset sequence on 15. > > Do you mean a kernel on Feb 15? Was there more that happened between > 15th and the 22nd or so? You have made something unreadable cutting and gluing two my sentences. :) Yes, there many happens between Feb 15 and 22. >> When you succeed to boot, can you try to make some experiments against >> HEAD, may be some of them fix the problem: >> 1) comment that line inside ata_ahci_issue_cmd(): >> ATA_OUTL(ctlr->r_res2, ATA_AHCI_P_FBS + offset, (port << 8) | >> 0x00000001); >> >> 2) comment these lines inside ata_sata_phy_reset(): >> if ((ATA_IDX_INL(ch, ATA_SCONTROL) & ATA_SC_DET_MASK) == >> ATA_SC_DET_IDLE) >> return ata_sata_connect(ch); >> >> 3) comment first that line inside ata_ahci_softreset(): >> return (-1); >> >> Thanks. >> > > I'll try these patches and report back right after I freshen up my backups. :) First one is already committed to the HEAD. It may be related to your problem. -- Alexander Motin