Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Jul 2005 19:56:32 +0100
From:      Dominic Marks <dom@goodforbusiness.co.uk>
To:        freebsd-stable@freebsd.org, Tony Byrne <freebsd@byrnehq.com>
Cc:        mi@freebsd.org, alan@cyclopsvision.co.uk
Subject:   Re: SATA Problems - ATA_Identify timeout ERROR - using Tyan S5350
Message-ID:  <200507011956.33217.dom@goodforbusiness.co.uk>
In-Reply-To: <1148375339.20050701171213@byrnehq.com>
References:  <200507011534.BBK31161@C2bthomr05.btconnect.com> <200507011700.14321.dom@goodforbusiness.co.uk> <1148375339.20050701171213@byrnehq.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday 01 July 2005 17:12, Tony Byrne wrote:
> Hello Dominic,
>
> DM> Do you see them with ATA mkIII as well?
>
> I tried the ATA mkIII patches a few weeks ago on one of servers,
> which was suffering DMA TIMEOUTs, but they made no difference.

Tried Linux on the same hardware? A strange suggestion perhaps,
but I suspect that something is wrong with the hardware, or its
configuration. It might turn up something interesting, at least
it will rule the possibility of bad hardware in, or out.

> This problem may only occur with certain combinations of controller
> and SATA hard drive. For example, I have a workstation with an ICH5
> controller which had been frequently emitting TIMEOUT messages when
> it had a 80Gb 7200k Seagate Barracuda SATA drive installed. As a test

See below for what is possibily a very similar story.

> this afternoon, I swapped out the Seagate for a new 250Gb Western
> Digital SATA drive, and installed 5.4 RELEASE. The machine is now in
> the process of building world, and I've yet to see any TIMEOUTs.

I have lots of ICH5 and ICH6 systems and I haven't had any WRITE_DMA
errors on them (to my knowledge). All of my problems in this area are
to with Sil3112 based cards. An 80GB seagate drive attached to an
Adaptec "RAID" controller with the Sil3112 in had absolutely terrible
performance, about 4MB/s (r+w) at best. Replacing the Seagate
disc with a Maxtor fixed the problem. I wasn't looking for these
messages at the time, so they may have been a factor. What I did notice
is that the drive (according to gstat) was improbably busy almost
all the time for the minor load I was placing on it. I have another
two Sil3112 based cards from no-name suppliers, these also
seem to have issues - but they vary quite a lot. 

Just this week I have been setting up a SATA RAID using graid3 and
3x 250GB WD discs. In my test system I installed two of the Sil3112
cards, in addition to the integrated ICH6 onboard. Attaching a drive
to each controller gives good performance ~105MB/s peak according to
diskinfo. However trying to use the raid array in this configuration
results in either of the drives dedicated to data having frequent
problems, including the drives dropping out of the array at random
during periods of load because of write FAILUREs. This makes the array
useless since it is incapable of sustaining even a rebuild of the
array without one, or both of the data drives failing.

I never have any errors from the disc used for parity, which I put
on the ICH6. If I reconfigure the machine and put both data discs
on one of the Sil3112 controllers the system is stable and I can
work with it. Performance suffers quite a bit, dropping to a peak
of 98MB/s (ok, it's not much to cry about :-)). If I put the parity
drive on either of the Sil3112 controllers I normally get a write
failure, followed by a panic within minutes.

What concerns me more about this configuration is that the speed
limit on a rebuild of array. If I place a drive on each controller
the rebuild runs at ~50MB/s (from memory, might be a little less).
However, in the stable configuration the rebuild cannot pass 20MB/s.
Which on a 500GB array, makes quite a difference to the rebuild time.

Incidentally, this configuration is only stable on one of the Sil3112
cards I have (no RAID). I have one with RAID (probably almost exactly 
like the Adaptec), and one without.

If your stuck with the hardware you have, like I am, I suggest that
you setup a gmirror of your two dodgy drives, if one of them happens
to encounter a read/write failure its less likely to take the system
down. Hardly a fix though.

Mikhail, I hope you don't mind me CC'ing you here. I read in a message
that you had cured some problems like this by changing some of the
PCI timers (?). Could you possibly say a little about what you did?

> Regards,
>
> Tony.

Cheers,
-- 
Dominic
GoodforBusiness.co.uk
I.T. Services for SMEs in the UK.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200507011956.33217.dom>