Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Aug 2005 08:31:59 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: ad10: WARNING - READ_DMA UDMA ICRC error (retrying	request)	LBA=11441599
Message-ID:  <20050810133159.GA10150@FS.denninger.net>
In-Reply-To: <6.2.1.2.0.20050810081251.05298ff0@64.7.153.2>
References:  <42F7F7E8.1020507@mail.uni-mainz.de> <42F9009E.3030601@mac.com> <42F9609E.1010207@goldsword.com> <20050810023111.GA2913@FS.denninger.net> <20050810024618.GA8198@drjekyll.mkbuelow.net> <6.2.1.2.0.20050810081251.05298ff0@64.7.153.2>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 10, 2005 at 08:15:50AM -0400, Mike Tancsa wrote:
> At 10:46 PM 09/08/2005, Matthias Buelow wrote:
> >Karl Denninger wrote:
> >
> >>SII chipsets were ok in 4.x, but the newer ATA code broke badly with them.
> >>I've had a PR open on this since February, and many others have reported
> >>similar issues.  The problems still exist in the 6.x-BETA releases I've
> >>checked out, and are in some cases MORE severe (for me anyway) than they 
> >are
> >>in 5.4.
> >
> >Well, it doesn't affect just the SII chips.. I see the same on an
> >Intel ICH6 chipset but never after the kernel has mounted the root
> >fs. Sometimes it takes several attempts until it manages to do so,
> >though. The machine works w/o any such problems on other OSes. I've
> 
> I have ICH6 boxes and they run just fine without issue.  Have you checked 
> to see if it actually has bad sectors or is a problem with your tray (if 
> you use one)
> 
> [verify1]% dmesg | grep -i ich
> uhci0: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A> port 
> 0xe000-0xe01f at device 29.0 on pci0
> usb0: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-A> on uhci0
> uhci1: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B> port 
> 0xe100-0xe11f at device 29.1 on pci0
> usb1: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-B> on uhci1
> uhci2: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C> port 
> 0xe200-0xe21f at device 29.2 on pci0
> usb2: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-C> on uhci2
> uhci3: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-D> port 
> 0xe300-0xe31f at device 29.3 on pci0
> usb3: <Intel 82801FB/FR/FW/FRW (ICH6) USB controller USB-D> on uhci3
> fxp0: <Intel 82562EZ (ICH6)> port 0xd000-0xd03f mem 0xd0000000-0xd0000fff 
> irq 10 at device 8.0 on pci1
> atapci0: <Intel ICH6 SATA150 controller> port 
> 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.2 on pci0
> [verify1]%

I have an ICH5 on my motherboard and it works fine - it is under heavy use
and has had no trouble.

atapci1: <Intel ICH5 SATA150 controller> port 0xfea0-0xfeaf,0xfe30-0xfe33,0xfe20
-0xfe27,0xfe10-0xfe13,0xfe00-0xfe07 irq 18 at device 31.2 on pci0

There's two potential issues here - if it is failing during the boot process
that's a completely different set of code - and thus potential problem -
than failing while running.

I've not had booting problems with the ICH5, and no failures while running.

On the other hand the SII chipset boards I have (two of them) can be
RELIABLY forced to fail within minutes.  If there are no drives in the
mirror set on something else, the data on the disk(s) is toast - if so 
they detach and the non-SII-attached disks end up carrying the data.

This is across two different manufacturers of drives (Hitachi and Maxtor)
and FOUR separate disks, all four of which smartmontools bless as operating
properly and all of which ran just fine under 4.x.

Oh, and all of which work just fine on a 3ware 8502 card.

I've read the reports that basically boil down to "the SII chipset sucks,
don't use it" BUT (1) it works under 4.x, (2) it works under other operating
systems and (3) the FreeBSD folks who are saying it doesn't work don't have 
the courage of their statements to make them in the official release 
documents (e.g. the release notes, hardware compatability guide or erratta.)
So while the chipset may or may not be "less desireable", what is clear is 
that the problems with it are not insurmountable - they've just not been 
taken care of in the newer ATA code.

Arguments that this is about resources (e.g. the developers don't have a
card and need anything from a board to a complete system to have any chance 
of fixing it) ring pretty hollow to me.  This is an EXTREMELY popular chipset,
is on both the Adaptec and Bustek cards commonly sold with machines and at
retail, and cards with that chipset can be had for as little as $30 (and up,
of course.)  In addition I've yet to find a SATA drive that WON'T fail with 
this board - or a motherboard that is stable with it on FreeBSD 5.4 or 6.x - 
it is definitely NOT linked to the drive and I have no confidence its linked 
to the motherboard chipset in any way. Further, smartmontools says the disks 
that do fail aren't defective, and it worked just fine under 4.x.  

Also, I've yet to see a developer commit on the list that they WILL fix it if 
such a controller board is forthcoming (and will return the board when they're 
done) - I've got two of these cards here (choose between Adaptec and Bustek) 
and would be happy to UPS one to someone IF I had a firm commitment that 6.x 
would NOT go out without this being addressed and that the board would be 
returned to me when work was complete.

Finally, while the 3ware card works fine, it doesn't support hot plug.  The 
SII chipset claims to, and so does the ATA code, but the 3ware card runs on 
a different driver - which doesn't (either claim to or actually accomplish 
it.)

So while using a 3ware card solves the "blows chunks and dies" problem, you
are back to the lack of functionality that was present in 4.x - no hot drive
swap support.  (This is mitigated somewhat by the 3ware management tools,
which do allow reconnection and work - but its a manual operation.)  This
entirely voids the argument for ATAng being a "step forward" - support of 
hot plug and other functionality improvements - in the first place, since
you can't actually USE that capability if you are forced to a 3ware board!

Again, I think the ATA-xx issues are of a magnitude sufficient to basically
kill FreeBSD going forward in the desktop application arena.  If all FreeBSD
as an organization cares about is the large server marketplace, I guess that
works.  But small office / home office file servers are going to be SATA
based and moderately-data systems with low entropy (e.g. 300-600gb) are FAR 
more cost effective to deploy on SATA than on SCSI, and easily meet the 
performance and data stability requirements.  If FreeBSD is unstable on 
those systems without putting in specialized, vendor-supported hardware, 
then FreeBSD may well be ceding those segments of the Unix marketplace 
to something else (e.g. Linux.)

I believe that would be most unfortunate - I have been supporting FreeBSD 
as a platform for the code I sell in these environments exclusively for 
more than five years, and ran it as the only Unix OS we used at the ISP 
I used to own.  My stance has been for the last five years (since selling my
ISP) that if you want me to support code that I sell you have to be running 
it on FreeBSD.  

FreeBSD has earned this position by being a superior solution in all
respects, but most particularly in the area that is most important - 
operating stability.

If I start having customers run into stability problems with 5.x and beyond
on hardware that properly worked under 4.x, I will be forced to port the 
code over to Linux, as I cannot force people to run 4.x as the base OS when 
its been EOL'd (other than for security fixes) and yet their hardware 
simply doesn't work right with the current FreeBSD code.

As things stand I am adding a STRONG warning in my product release notes 
stating that if you have a SII chipset SATA controller and run any version 
of FreeBSD from 5.4 onward you are doing so at your own risk and against my 
specific recommendations.

The warning will be removed from my products when the PR that I filed in
February is addressed OR FreeBSD places an equally strong warning in their
release notes, rendering the warning unnecessary.

--
-- 
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://genesis3.blogspot.com	Musings Of A Sentient Mind





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050810133159.GA10150>