From owner-freebsd-stable@FreeBSD.ORG Fri Jun 18 07:51:08 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E9B41065672 for ; Fri, 18 Jun 2010 07:51:08 +0000 (UTC) (envelope-from pieter@thelostparadise.com) Received: from mail.thelostparadise.com (router.thelostparadise.com [IPv6:2a02:898:0:30::30:1]) by mx1.freebsd.org (Postfix) with ESMTP id 00E0D8FC1D for ; Fri, 18 Jun 2010 07:51:07 +0000 (UTC) Received: by mail.thelostparadise.com (Postfix, from userid 127) id 671FA73054; Fri, 18 Jun 2010 09:51:06 +0200 (CEST) Received: from localhost by mail.thelostparadise.com (Postfix) with ESMTP id D340E73008; Fri, 18 Jun 2010 09:51:04 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha1; c=simple/simple; d=thelostparadise.com; s=thelostparadise; t=1276847464; bh=FN7dZ4898Z6ogg0vCkjq6arPHIU=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=DE1vksH6Fc5w 0fLORk0THiKFWLMjcK4ELFajQAlMIXKg4QK30+juaJPtZIX9mru6vyXyNkyYLg6N8Hu 7nttaYcJ5+9lxQUWpohMY+cxn6SGMmnOLV1MFWrZMDNk1b7E2p+UZEiChf/ezU8oJa+ 631gpKXCQEzPuEXQyf4L059RA= Message-ID: <4C1B2565.9010509@thelostparadise.com> Date: Fri, 18 Jun 2010 09:51:01 +0200 From: Pieter de Boer MIME-Version: 1.0 To: Matthew Lear References: <1276844904.7519.19.camel@almscliff.bubblegen.co.uk> In-Reply-To: <1276844904.7519.19.camel@almscliff.bubblegen.co.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org Subject: Re: 7.2-RELEASE-p4, IO errors & RAID1 failure X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jun 2010 07:51:08 -0000 Hi Matthew, > I'm running 7.2-RELEASE-p4 on an i386 HP server (ML G5) in RAID1 > configuration. Very recently, I've seen IO errors such as: > > ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=20472527 > > reported and the RAID mirror is now offline. > > ad0: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=395032335 > ad0: FAILURE - WRITE_DMA48 status=51 > error=10 LBA=395032335 > ar0: WARNING - mirror protection lost. RAID1 array in DEGRADED mode > I had more or less the same timeout issues on my 8.0-RELEASE box on a Dell R300 with SATA disks. What I did was raise the ata timeout from 5 seconds to 20. I did this by patching the kernel code while running, but I'm not sure you'd like that approach ;) In http://www.freebsd.org/cgi/query-pr.cgi?pr=111023 a patch is presented that raises the timeouts by patching a few ATA kernel source files. This has been committed to RELENG_7 as well, so by upgrading your 7.2-install to the latest RELENG_7 (or RELENG_8), you'll have that timeout fix. Why ATA commands can take longer than 5 seconds although the disks appear to be fine.. wouldn't know .. -- Pieter