Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Jul 2005 15:22:34 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: Quality of FreeBSD
Message-ID:  <20050721202234.GA62615@FS.denninger.net>
In-Reply-To: <6.2.1.2.0.20050721153750.0851fab0@64.7.153.2>
References:  <200507211803.j6LI34dV005050@ferens.net> <20050721194500.W9208@fledge.watson.org> <20050721192613.GA61902@FS.denninger.net> <6.2.1.2.0.20050721153750.0851fab0@64.7.153.2>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 21, 2005 at 03:51:13PM -0400, Mike Tancsa wrote:
> At 03:26 PM 21/07/2005, Karl Denninger wrote:
> >Ok, Robert, but then here's the question....
> >
> >How come the ATA code which was very stable in 4.x was screwed with in a
> >production release, breaking it, with no path backwards to the working
> >code?
> 
> I understand your frustration, but others would argue if the changes were 
> not made that would say (and have) "How come modern and common hardware 
> like XXXX do not work with FreeBSD.  The driver is old and unmaintained 
> and does not support feature YYYYY."  I dont see Soren's work as 
> "screwing with production drivers" as opposed to him re-writing them to 
> take advantage of modern hardware designs.  Unfortunately along the way 
> some things might break.  They have for me, but that sometimes happens in 
> open source (and commercial code too for that matter).
> 
>         ---Mike 

ATA-NG (Soren's new code) is not (from what I understand) in the 5.x 
codebase.  One bone of contention is that apparently it IS in -HEAD, but 
there are no plans to MFC it to 5.x. 

My understanding is that the 5.x code is a half-baked version of ATA-NG,
and IMHO it had no business going into a PRODUCTION release in the state
that it was pushed over.

The decision path on including half a loaf in this case is not something I 
was privvy to - but I've certainly been "privvy" to the results!  I fought
with unsolicited detachments of drives claimed to be "defective" (when
they were and are not) and several crashes when the only remaining "good"
device on the mirror was also declared "bad" - some of which came with
filesystem data corruption - for over a month before I came up with a
configuration that gives me both RAID 1 data protection and REASONABLE
stability (meaning I have uptimes which are not controlled by unsolicited 
crashes!)

I am however VERY leery of following -STABLE, since there are reports here
on the list that more recent versions than what I'm running may have
regressed once again.  

I DEFINITELY do not want to go through what I did back in the first part 
of the year again.

Given that we were all "strongly" encouraged to upgrade to 5.x for production
machines a few months ago it was a truly ugly surprise to find that current 
production hardware which ran just fine on 4.x was hosed to the point of 
unusability with 5.x as a consequence of serious (some would say CRITICAL)
driver issues.

Whether the full ATA-NG code actually fixes the problem is (to me anyway)
unknown - but I am not about to devote a bunch of testing time to it when
its in a codebase that I can't run AND it has been stated that there is 
no intent to MFC it.

Now if there was a commitment to MFC the code I would be happy to engage 
in testing against -HEAD, and see if I can provoke the same sort of 
misbehavior I get on 5.x.

Without that commitment, however, testing it is fruitless for me, since 
I have no path out of the box I'm in other than "sit on hands and wait an
indeterminate amount of time", and this testing involves a significant
time commitment - I not only have to replicate the 5.x production machines 
I've got in the field that have had trouble (not too hard), I also have to 
generate a synthetic load sufficient to know if the problem is truly 
resolved or not (that will take some effort.)

I've come up with a workaround that is "functional" for my production
systems, but that workaround came only with a huge time investment and 
IMHO this is a stability defecit that simply should not have happened.  

In the time I've run FreeBSD (going back a LONG ways, including using it
as the OS of choice behind a major regional ISP in the mid-late 90s) this 
is the worst instance of regression in terms of stability across purported 
"RELEASE" versions I've seen - for it to be "poo-pooed" and outstanding
PRs effectively ignored for six months is IMHO quite a black eye event.

--
-- 
Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
http://www.denninger.net	My home on the net - links to everything I do!
http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
http://homecuda.com		Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.com	Musings Of A Sentient Mind





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050721202234.GA62615>