Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Sep 2008 14:20:28 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Mel <fbsd.questions@rachie.is-a-geek.net>
Cc:        Reid Linnemann <lreid@cs.okstate.edu>, freebsd-questions@freebsd.org
Subject:   Re: SATA READ_DMA timeouts - SOLVED?
Message-ID:  <20080930212028.GA56646@icarus.home.lan>
In-Reply-To: <200809301929.27126.fbsd.questions@rachie.is-a-geek.net>
References:  <48E1465A.5040903@cs.okstate.edu> <20080930023736.GA22907@icarus.home.lan> <48E259B4.3040100@cs.okstate.edu> <200809301929.27126.fbsd.questions@rachie.is-a-geek.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 30, 2008 at 07:29:26PM +0200, Mel wrote:
> On Tuesday 30 September 2008 18:54:12 Reid Linnemann wrote:
> > Jeremy Chadwick wrote:
> > > (I'm not subscribed to freebsd-questions, so please CC me on replies.
> > > I'm also not sure how I ended up getting this mail in the first place;
> > > it looks like someone BCC'd my koitsu@freebsd.org address).
> >
> > Yes, I BCC'd you since you are maintaining a page on the wiki
> > documenting SATA DMA problems.
> >
> > > Furthermore, one of the most common reports on the FreeBSD lists is the
> > > exact opposite -- users complaining that "their disks are SATA300 but
> > > only operate at SATA150" (caused by that jumper).  Users are told to
> > > remove the jumper, and are reminded that the reason the jumper is
> > > enabled by default is said chipset incompatibilities.
> > >
> > > That said, your mail confuses me for one reason:
> > >
> > > Were you receiving DMA errors with the jumper REMOVED (e.g. SATA300
> > > operation), or with the jumper ENABLED (SATA150 operation)?  Your below
> > > description does not state what exactly you did with the jumper to make
> > > your drives work reliably, only "that the jumper capability on your
> > > disks was available".
> >
> > I should have been more clear.
> >
> > My disks came with no cap on the SATA150 jumper, although FreeBSD
> > reported that they were in SATA150 mode. The system would be unusable
> > from READ_DMA timeouts if the system was ever powered off and brought
> > back up. I had to do some voodoo of booting in single user mode with
> > ACPI turned off to repair filesystems and rebuild my gmirror, then load
> > ACPI and drop back into multi-user mode. I even had to do this if the
> > system was powered off gracefully. So far, since I capped the jumpers
> > this has not been the case. I still get them periodically if I do
> > something like rebuild a gmirror component, so I can no longer say my
> > problem is completely resolved.
> 
> Is this on 7.x? Sounds very similar to my experience described in:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=122572&cat=kern
> 
> The machine is now operational and working in UDMA33 mode with two gmirror'ed 
> SATA, using 6.3-p4. Unfortunately, I can't risk "trying 7.x" anymore, since 
> it's emergency storage for the main fileserver, so dataloss is 
> unacceptable :/. I do not know about the jumper state at the moment. I will 
> inform if there will be a window real soon now, to check for jumpers.
> 
> Ata info:
> # atacontrol list
> ATA channel 0:
>     Master: acd0 <HL-DT-STDVD-ROM GDR-T10N/1.02> ATA/ATAPI revision 5
>     Slave:       no device present
> ATA channel 1:
>     Master:      no device present
>     Slave:       no device present
> ATA channel 2:
>     Master:  ad4 <WDC WD6400AAKS-65A7B0/01.03B01> Serial ATA II
>     Slave:       no device present
> ATA channel 3:
>     Master:  ad6 <WDC WD6400AAKS-65A7B0/01.03B01> Serial ATA II
>     Slave:       no device present
> 
> # atacontrol cap ad4
> 
> Protocol              Serial ATA II
> device model          WDC WD6400AAKS-65A7B0
> serial number         WD-WMASY1885186
> firmware revision     01.03B01
> cylinders             16383
> heads                 16
> sectors/track         63
> lba supported         268435455 sectors
> lba48 supported       1250263728 sectors
> dma supported
> overlap not supported
> 
> Feature                      Support  Enable    Value           Vendor
> write cache                    yes      yes
> read ahead                     yes      yes
> Native Command Queuing (NCQ)   yes       -      31/0x1F
> Tagged Command Queuing (TCQ)   no       no      31/0x1F
> SMART                          yes      yes
> microcode download             yes      yes
> security                       no       no
> power management               yes      yes
> advanced power management      no       no      0/0x00
> automatic acoustic management  yes      yes     128/0x80        128/0x80
> 
> # atacontrol mode ad4
> current mode = UDMA33

No -- what Reid is reporting is very different.

His problem is that his disks came out-of-the-box operating at SATA300
speeds, and his SATA chipset does not work reliably with SATA300.  He
found that by setting the SAT150-limiting jumper, he achieved stability.

What you're seeing here (a SATA drive being limited to ATA33 speed)
could be due to one of the following things:

1) BIOS options have set the SATA ports to "Compatible" or "Emulated".
What this does is tell your southbridge to emulate the SATA disks as old
PATA disks, and I believe the emulation layer does use ATA33 (not
ATA66/100/133).  This is available so you can use SATA disks on very old
operating systems (possibly things like MS-DOS).

"Enhanced" means to run the disks and controller in a standard SATA
fashion.  "Enhanced" can also provide you extra functionality, such as
"Enhanced IDE", "Enhanced AHCI", or "Enhanced RAID".  It depends greatly
on the chip being used, and what features it has.

2) Board is using a SATA chipset which lacks a PCI ID table entry in
FreeBSD, yet is somehow operating in a "generic" fashion (I'm not
referring to generic AHCI either, although that could also apply here,
as ata(4) has "generic AHCI" support).

3) Board is using a SATA chipset which has a PCI ID entry in the table,
but actual code that interfaces with it in ata(4).

In the case of items #2 and #3, the results are mixed.

Some people have reported that when "UDMA33" is shown with SATA disks,
that it's purely cosmetical -- that is to say, the actual transfer speed
can exceed 33MByte/sec.  A series of "dd" tests reading/writing to the
disk should be sufficient to determine this.

In the case "UDMA33" is printed and the actual transfer speed *is* in
fact operating at ATA33, that is a strong indicator that FreeBSD lacks
the code to initialise/handle your SATA chipset correctly, and is
defaulting to UDMA33.

If that's the case, I'd recommend working with the ata(4) folks (I can
point you to them) to get support added for your chip.  Otherwise,
support will either be added many years from now when someone else
points it out, or will never get added at all.

You didn't provide any dmesg output so I can't tell what SATA chipset
or motherboard you're using.

Many ATA and SATA chips have been added to RELENG_7, and I doubt the
changes will be backported to RELENG_6.

It would be worthwhile if you could consider booting a RELENG_7 LiveCD
ISO and see if your disks are seen -- and if so, if they show up at
SATA speeds.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080930212028.GA56646>