Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Oct 2008 19:41:43 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Carl Voth <cvoth@telus.net>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: gmirror slice insertion, "FAILURE - READ_DMA status=51<READY, DSC, ERROR>"
Message-ID:  <20081028024143.GA37131@icarus.home.lan>
In-Reply-To: <49067148.6080307@telus.net>
References:  <49067148.6080307@telus.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 27, 2008 at 06:56:24PM -0700, Carl Voth wrote:
> I'm setting up a dual-disk server and am trying to bring it up with  
> gmirror and gjournal. One slice per disk, the goal being to create a  
> single mirror from said slices with some of the partitions journaled.  
> Installed FreeBSD-7.0RELEASE to ad4, then used technique from here to  
> create single-disk mirror/gm0 on ad6:
>
>   http://people.freebsd.org/~rse/mirror/
>
> Modified ad4s1a /boot.config to pass control to boot stage 3 on ad6. So  
> far, so good. Began Ralf's procedure for inserting ad4s1 into  
> mirror/gm0. The synchronization began and reached 6% when this little  
> horror appeared:
>
> ad6: FAILURE - READ_DMA status=51<READY,DSC,ERROR>  
> error=40<UNCORRECTABLE> LBA=134802751

Are you sure you don't have a bad hard disk?  This looks to be like a
classic block/sector failure.  This does not appear to be the infamous
famous "DMA timeout" problem, especially if this is the only error
you're getting.

> I reinstalled FB7 to ad4, redid the /boot.config modification to make  
> ad6/gm0 bootable again and retried the insertion of ad4 into gm0. Exact  
> same error messages at exactly the same point with same consequences.  

So you're saying that the *exact* same READ_DMA error, at the *exact*
same LBA, is reported on ad4?  If so, that's very bizarre.

> Now, I see that other folks are having unexplained DMA problems too,  
> albeit in different contexts. What should I be concluding here? Those  
> other folks don't seem to be concluding it's bad drives. If there were  
> bad sectors, I'd get different error messages, yes?

The "error=40<UNCORRECTABLE>" part of what you're seeing seems to imply
there's an uncorrectable read transaction that's happened.  What other
people see are DMA timeouts, but no actual sign of uncorrectable errors.

The problem with the "DMA timeout" issue is that it manifests itself in
hundreds of different ways.  Each case so far has to be handled on an
individual basis.

> FWIW, I'm using gjournal on 3 partitions in mirror/gm0.
>
> Here's my server's parts list:
> - Seagate ST31000340AS Barracuda 7200.11, 1TB, SATA (x2).

Can you please provide the output from the following commands?

dmesg
vmstat -i
atacontrol list
atacontrol cap ad4
atacontrol cap ad6
smartctl -a /dev/ad4
smartctl -a /dev/ad6

Thanks.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081028024143.GA37131>