From owner-freebsd-stable@FreeBSD.ORG  Thu Jul 21 13:29:00 2005
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
X-Original-To: freebsd-stable@freebsd.org
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 984DC16A425
	for <freebsd-stable@freebsd.org>; Thu, 21 Jul 2005 13:29:00 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id E22DD43D55
	for <freebsd-stable@freebsd.org>; Thu, 21 Jul 2005 13:28:41 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with ESMTP id 39D9046B28;
	Thu, 21 Jul 2005 09:28:41 -0400 (EDT)
Date: Thu, 21 Jul 2005 14:29:08 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: MikeM <the.lists@mgm51.com>
In-Reply-To: <200507210850530519.03A3275D@sentry.24cl.com>
Message-ID: <20050721135839.K97888@fledge.watson.org>
References: <1121917413.4895.47.camel@localhost.localdomain>
	<20050721095732.GG52120@stack.nl>
	<200507212029.47615.doconnor@gsoft.com.au>
	<200507210850530519.03A3275D@sentry.24cl.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-stable@freebsd.org
Subject: Re: Quality of FreeBSD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Jul 2005 13:29:00 -0000

On Thu, 21 Jul 2005, MikeM wrote:

> Your comment presupposes that most of the bugs are specific to one piece 
> of hardware, I doubt that is a valid assertion.  I would offer that most 
> of the bugs are not present in source code specific to a certain piece 
> of hardware, but are present in source code that is run across much of 
> the hardware that FreeBSD runs on.  As such, it is just a matter of 
> setting up the correct QA testing scripts to catch the bugs.
>
> Once a bug is reported, and that bug can be reproduced on the hardware 
> of the development team, then that bug should not reappear again, 
> because there should be a testing script written for it.
>
> Additionally, every software bug is not only a defect in the software, 
> but it also represents a defect in the process that created the 
> software.  Bugs should be looked at to analyze why they occurred, and 
> what in the process might be changed to prevent the same or similar bugs 
> from recurring.

Some of us have actually spent quite a bit of time looking at the defect 
sets reported for 5.x.  Depending on the release they fall into a number 
of categories, but here are the major ones I've identified:

- ACPI-related hardware probe issues, especially in earlier 5.x releases
   when the ACPI code (especially Intel vendor code) started knowing how to
   work around common ACPI BIOS bugs.  The source of these problems was
   often that BIOS ACPI code contained work-arounds for Windows ACPI bugs.
   Newer 5.x releases have blacklists of known bad BIOSes, workarounds for
   bugs, etc, and this is a much less reported problem now.  These problems
   weren't present in 4.x because ACPI wasn't supported in 4.x; on the
   other hand, there's a broad range of modern server hardware that now
   requires ACPI to boot, so 4.x didn't run on that hardware, or supported
   it poorly.  After a very large effort, ACPI problems are massively
   reduced.

- ATA problems.  Many of these, while a symptom of bugs in the ATA code
   running without Giant, were very specific to timing, or divergent/poor
   ATA hardware.  As a result, they were difficult to reproduce in any
   environment but the original reporting environment.  The same hardware
   might perform fine in a FreeBSD developer's system.  Many of these
   problems have now been resolved, but some have not.  Often as not, the
   problems have to do with retrying requests to drives.  As I mentioned,
   we believe the ATA code in 6.x is much more resilient, but right now
   what it needs is testing, not merging to 5.x yet.  Fixes require just as
   much testing as any other change, since a fix for one issue may well
   trigger another issue, especially in the world of cheap PC hardware.

- Network stack stability under high load, especially on SMP.  Many of
   these bugs had to do with exercising timing and race conditions
   "precisely right", and involved workloads not in the standard set of
   testing performed.  In many cases, those workloads have now been added
   to the regression test suite.  For example, there were a number of race
   conditions relating to the closing of sockets and network stack teardown
   in the protocols.  These tended to turn up on systems running tens of
   thousands of rapidly opening and closing TCP connections on SMP
   hardware.  Reproducing those conditions is difficult, and not something
   most FreeBSD developers have the resources to do, so have to wait for
   bug reports from people who do have those resources.

   However, over the past 12 months we've been working to put together a
   "netperf" test cluster, using hardware donated by a number of
   organizations, including the FreeBSD Foundation, FreeBSD Systems,
   IronPort Systems, as well as network connectivity and management donated
   by Sentex Communications.  This has allowed us to apply network tests in
   higher performance environments, and make high end SMP hardware
   available to a broader range of developers.

- Storage/file system related buffer starvation, deadlocks, etc, most a
   result of the development of snapshots and bgfsck support, changes in
   the I/O path, and so on.  A number of these have turned out to be driver
   bugs, but a fair number (especially in the 5.2 time frame) had to do
   with resource management in the UFS code.  Some still remain.

- Lock and resource leak crashes, especially with 5.2 and 5.3, when large
   parts of the system moved from running under Giant to running without
   it.  Our process has definitely improved here, through improved lock
   debugging tools, increased use of assertions, and the advent of things
   like Coverity's static analysis tools being run over the source tree.

- ACPI-like problems having to do with migrating interrupt and hardware
   configuration models.  These usually manifest as interrupt storms.  They
   are required changes to support modern server class SMP hardware, but
   often trigger bugs in a range of motherboard revisions from about 2-3
   years ago.  Sometimes, fixing these problems has required figuring out
   how Windows does the same thing, as only the behavior used by Windows is
   tested by the hardware vendor.  Go figure.

- Threading bugs associated with the creation of a new threading model and
   library, especially from 5.3.  Many of these have now been resolved,
   although further work on performance is on-going.

While there's a lot to be said for software engineering and improving 
practices, and that we're working on that in a number of directions, I 
think it is inaccurate to claim that software defects are always represent 
solvable process defects.  The reality is that all known software 
development processes involve software defects, especially with complex 
software systems.  In operating system development, detecting bugs is 
often a property of very specific environments that are hard or impossible 
to reproduce or test in a formal way, at least not without huge expense. 
Many bugs cannot be reproduced on hardware owned by developers, because 
the cost (not to mention heat, time, and power constraints) would be 
excessive.  Many testing network environments require several computers in 
order to reproduce even simple scenarios, as well as substantial 
configuration work.

Combinatorics is also a practical reality, and doesn't work in our favor. 
For example: Soren does most of our ATA development.  Last I checked, he 
had hundreds of ATA adapters, hundreds of storage devices, and a good 
dozen or more test computers with various properties (speed, notebook or 
not, SMP, hardware architecture, bus topology, BIOS revisions, and so on). 
Assuming the general accuracy of the above numbers, and assuming it took 
only five minutes per configuration to perform all testing, we're talking 
about 83 days of non-stop 24-hour testing.  Of course, the basic 
assumptions are flawed, because it takes at least five minutes to set up a 
hardware configuration, let alone run the hours of testing that you'd want 
to run in the configuration.  These are not problems that are solved 
through test scripts, they are fundamental to the general problem of 
testing device drivers and device driver frameworks.  By doing a subset of 
the above testing, you get software that is reliable on most hardware. 
And indeed, that's what FreeBSD ships: software that is reliable on most 
hardware.

A number of people have been working to improve FreeBSD testing.  If 
you're interested in working on it, as you obviously have clear views on 
how it should work, we'd welcome your help.  Especially if it comes to 
defining test cases, producing scripts or test tools that can be run 
mechanically, etc.  There have been a couple of concerted efforts to 
formalize our regression test framework, and in particular, to standardize 
how tests are included, documented, and run.  Most of these efforts peter 
out, because the work is hard, offers few rewards, and the attention spans 
of developers are limited by reality (jobs, etc).  I think everyone is in 
agreement that this is an area where there's more work to be done, and 
contribution is welcomed.

Robert N M Watson