From owner-freebsd-stable@FreeBSD.ORG  Sat Dec 20 14:14:08 2003
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id F314116A4CE
	for <freebsd-stable@freebsd.org>;
	Sat, 20 Dec 2003 14:14:07 -0800 (PST)
Received: from lakemtao06.cox.net (lakemtao06.cox.net [68.1.17.115])
	by mx1.FreeBSD.org (Postfix) with ESMTP id BD41E43D72
	for <freebsd-stable@freebsd.org>;
	Sat, 20 Dec 2003 14:13:26 -0800 (PST)
	(envelope-from kitbsdlists@HotPOP.com)
Received: from vixen42 ([68.109.49.234]) by lakemtao06.cox.net
          (InterMail vM.5.01.06.05 201-253-122-130-105-20030824) with SMTP
          id <20031220221321.PNLT24575.lakemtao06.cox.net@vixen42>;
          Sat, 20 Dec 2003 17:13:21 -0500
Date: Sat, 20 Dec 2003 16:12:01 -0600
From: Vulpes Velox <kitbsdlists@HotPOP.com>
To: "Oivind H. Danielsen" <oivind.danielsen@kopek.net>
Message-Id: <20031220161201.60833ea2.kitbsdlists@HotPOP.com>
In-Reply-To: <NMEPLAHDNAPMGKOIJMLLIEFNCAAA.oivind.danielsen@kopek.net>
References: <NMEPLAHDNAPMGKOIJMLLIEFNCAAA.oivind.danielsen@kopek.net>
X-Mailer: Sylpheed version 0.9.6claws (GTK+ 1.2.10; i386-portbld-freebsd4.9)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
cc: freebsd-stable@freebsd.org
Subject: Re: WRITE command timeout
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Production branch of FreeBSD source code
	<freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Dec 2003 22:14:08 -0000

On Sat, 20 Dec 2003 19:07:41 +0100
"Oivind H. Danielsen" <oivind.danielsen@kopek.net> wrote:

> Hello.
> 
> We have been running FreeBSD 4.6-5.1 systems for 1.5 years and are being
> plagued  by these:
> 
>  Dec 18 15:15:39 <> /kernel: ad0: WRITE command timeout tag=0 serv=0 -
> resetting
>  Dec 19 15:03:23 <> /kernel: ad0: READ command timeout tag=0 serv=0 -
> resetting

This is most likely cuased by the drive going bad or a bad cable.

> In our rack we have 34 identical drives (IBM IC35L080AVVA07).
> 
>   24 drives on Windows 2000    : no problems.
>    4 drives on Linux 2.4.x     : no problems.
> 
>    2 drives on RELENG_4_8
>     (VIA 82C686, VIA C3)       : no problems
> 
>    4 drives on RELENG_4_8
>     (nVIDIA nForce, XP 2000+)  : r/w timeouts, fs corruption.
> 
>   (1 drive/system, 6 FreeBSD boxes)
> 
> The good systems have been running the 1.5 years without a hitch. The
> four identical RELENG_4_8 systems have all had corrupted filesystems (at
> least once every two months).
> 
> 
> We have tried the following:
> 
>  - Changed ATA100 cables (3 diff. types, all 80-wire)
>  - Disabled DMA (use PIO4) (hw.ata.ata_dma="0" in loader.conf)
>  - Disabled DMA in BIOS setup
>  - Changed motherboard (MSI MS6734, VIA KM400, vt8235 ATA)
>  - Changed power supply (added 100W)
>  - RELENG_5_1.
> 
> None of these changes has helped. The only change seen when disabling
> DMA is  additional messages: "timeout waiting for DRQ - resetting".
> 
> 
> I have searched the net for more information on this topic for over a
> year, and  all I find is replies like:
> 
>   - "Just change the cable, dude.."   (did that, still timeouts)
>   - "IBM drives are bad for you."     (seen this with other drives too)
>                                       (drives work well on Linux/W2k)
>   - "Disabling DMA fixes it."         (tried that, it didn't)
>   - "ATA is for wimps. SCSI rulezz."  (different discussion)
> 
> 
> # sysctl hw.ata
> hw.ata.ata_dma: 0
> hw.ata.wc: 1
> hw.ata.tags: 0
> hw.ata.atapi_dma: 0
> 
> # atacontrol mode 0
> Master = PIO4
> Slave  = ???
> 
> # atacontrol info 0
> Master:  ad0 <IC35L080AVVA07-0/VA4OA52A> ATA/ATAPI rev 5
> Slave:       no device present
> 
> 
> dmesg, pciconf and kernel config are attached. No special compilation
> options  (except -DIPFW2) are used. I can provide more information on
> request.
> 
> We're now running FreeBSD 4.8-RELEASE-p14 and FreeBSD 5.1-RELEASE-p8,
> but the  problem has been around since we started out with 4.6 I
> believe.  The "good" and "bad" FreeBSD systems all use the same
> kernel/world.
> 
> 
> The reason why we have used such low-end hardware in these boxes is that
> they are  part of a highly redundant cluster solution for crypto
> processing (no storage is used for application purposes). This means the
> system can cope with the occasional fs corruption, but we would still
> prefer to get rid of it.
> 
> 
> I know this problem has been discussed before, but wanted to add more
> data to the  discussion. I don't think all of the reports should be
> attributed to bad HW. Nevertheless, even if the hardware is broken, the
> system should preferably  function equally well/bad as with Linux/W2k.
> 
> 
> Any help is greatly appreciated.
> 
> 
> Best Regards,
> 
> Oivind H. Danielsen
>