From owner-freebsd-stable@FreeBSD.ORG Sun Aug 20 16:20:53 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0974616A4DD for ; Sun, 20 Aug 2006 16:20:53 +0000 (UTC) (envelope-from byshenknet@byshenk.net) Received: from core.byshenk.net (core.byshenk.net [62.58.73.230]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2E6C143D5E for ; Sun, 20 Aug 2006 16:20:46 +0000 (GMT) (envelope-from byshenknet@byshenk.net) Received: from core.byshenk.net (localhost.aoes.com [127.0.0.1]) by core.byshenk.net (8.13.6/8.13.6) with ESMTP id k7KGKW3C050653; Sun, 20 Aug 2006 18:20:32 +0200 (CEST) (envelope-from byshenknet@core.byshenk.net) Received: (from byshenknet@localhost) by core.byshenk.net (8.13.6/8.13.6/Submit) id k7KGKWbE050652; Sun, 20 Aug 2006 18:20:32 +0200 (CEST) (envelope-from byshenknet) Date: Sun, 20 Aug 2006 18:20:32 +0200 From: Greg Byshenk To: freebsd-stable@freebsd.org Message-ID: <20060820162032.GE633@core.byshenk.net> References: <20060820120049.06E0516A52F@hub.freebsd.org> <200608201338.56109.matt@chronos.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200608201338.56109.matt@chronos.org.uk> User-Agent: Mutt/1.4.2.2i X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY autolearn=failed version=3.1.4 X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on core.byshenk.net Cc: Matt Dawson Subject: Re: ATA problems again ... general problem of ICH7 or ATA? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Aug 2006 16:20:53 -0000 On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote: > On Sunday 20 August 2006 13:00, freebsd-stable-request@freebsd.org wrote: > > Do you mean different type of cables, or just another piece? I can't > > change cables by myself, servers are dedicated from provider, but as I > > can saw, they picked whole new machine from their HW storage and put new > > Samsung disk drives in. So these two last machines are brand new with > > new cables. (Probably with a same type of cables - all machines are ASUS > > RS120) > I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based > system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and > it takes a reboot to bring it back. atacontrol reinit has no effect. Tried > the following to resolve the problems: > - Changed cables (both ad4 and ad6) > - Changed SATA power to legacy > - Moved the NIC and anything else from the shared PCI INT (thought I'd cracked > it at this point as it was stable for a month, then it lost ad6 on a nightly > dump) > - Remade my gmirror array as an ar. Put it straight back to gmirror again when > I found out what a pain it is to rebuild after ad6 disappears. I am not sure if it is related, but... I experienced a similar sort of problem, although the details in my case are quite different. What was similar was that I would "lose" two ATA drives from an array, inexplicably. Reconfiguring the same drives and rebuilding would cause them to work perfectly again -- for some number of days, after which the same failure would occur. What is different is that this was with a 3Ware RAID controller -- which made removing/raconfiguring/rebuilding much easier -- but I was seeing the exact same errors. This happened four times (with the same errors that have been discussed here), running 6.1 STABLE as of June 22. Before attempting to RMA the drives, I tried an updated kernel, 6.1 STABLE as of July 19. Strangely enough, the problems disappeared. So, while I have not checked everything that has changed, it _might_ be worth trying 6.1 STABLE... -- greg byshenk - gbyshenk@byshenk.net - Leiden, NL