From owner-freebsd-stable@FreeBSD.ORG Wed May 3 23:38:46 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6ED4316A41B for ; Wed, 3 May 2006 23:38:46 +0000 (UTC) (envelope-from freebsd-stable@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5DCED43D4C for ; Wed, 3 May 2006 23:38:44 +0000 (GMT) (envelope-from freebsd-stable@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1FbQvY-0000Gj-Gy for freebsd-stable@freebsd.org; Thu, 04 May 2006 01:38:40 +0200 Received: from r5k20.chello.upc.cz ([86.49.10.20]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 04 May 2006 01:38:40 +0200 Received: from martinkov by r5k20.chello.upc.cz with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 04 May 2006 01:38:40 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-stable@freebsd.org From: martinko Date: Thu, 04 May 2006 01:38:24 +0200 Lines: 140 Message-ID: References: <20060502171853.GG753@dimma.mow.oilspace.com> <20060502172225.GA90840@xor.obsecurity.org> <20060502174429.GH753@dimma.mow.oilspace.com> <44579EE1.6010300@rogers.com> <20060502180557.GA91762@xor.obsecurity.org> <4457A02C.9040408@rogers.com> <20060502182302.GA92027@xor.obsecurity.org> <20060503110503.O58458@fledge.watson.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: r5k20.chello.upc.cz User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20051205 X-Accept-Language: sk, cs, en-gb, en-us, en In-Reply-To: <20060503110503.O58458@fledge.watson.org> Sender: news Subject: Re: quota deadlock on 6.1-RC1 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 May 2006 23:38:46 -0000 Robert Watson wrote: > > On Tue, 2 May 2006, Kris Kennaway wrote: > >>>>> Ditto, same thing with the recent nve fixes. Why release known broken >>>>> code when there are tested patches available? Whats the worst that >>>>> will >>>>> happen? It wont work? Thats already the case... > > <...> > >> >> OK, I can't speak to that issue specifically. >> >> Generally, though, the worst that can happen is "you fix one problem >> affecting a subset of users and replace it with a larger problem >> affecting a larger subset of users". >> >> If there's doubt about the impact of a change, 10 seconds before the >> release is not the appropriate time to cram it in. > > <...> > > I just want to comment a bit on this issue, because I've seen a number > of posts on FreeBSD mailing lists over the last few years that suggest > that there may be some misunderstandings about software development and > releases processes. > > The invariant that needs to be understood is that all software is buggy; > arguments have been made that the number of bugs increases linearly with > code size, and there have also been arguments made that the number of > bugs increases with code complexity, so you can see a non-linear > increase in bugs with code growth. This means that you're talking about > several bugs per thousand lines of code in most software, and for code > that contains millions of lines of code (such as the FreeBSD kernel, > Linux kernel, Apache, PhP, MySQL, PostgreSQL, Windows, Word, iTunes, > etc), you're talking thousands or tens of thousands of bugs. And that's > in a static version of the code, not even taking into account new > features in an active code base that are still being "debugged"! > > Bugs fall into a lot of different categories, but from the perspective > of risk management, it's useful to think of them in two categories: > latent bugs, which are unreported, unobserved, or occur only in > exceptional or generally untriggered circumstances, and non-latent bugs, > which have been reported, are triggered in practice, etc. The tricky > ones are the latent bugs, because you may not know that they are there, > or you may know that they are there but trigger so infrequently or in > such unusual edge cases that they almost might as well not be there. > > Release engineering is really about two things: structuring/nurturing > the process of developing releases (tracking issues, identifying people > to fix them, testing, branch management, building, etc), and risk > management. The risk management aspect is that you want to improve the > quality of the release by taking actions, typically adopting source > changes, which may improve testing results. Each change potentially > affects both visible and latent bugs. Bug fixes in one piece of code > may change the timing of the code, the side effects, undocumented > assumptions, or simply allow access to code previously not executed > because the bug prevented it. If you allow a bug fix into the tree, you > risk uncovering new bugs. So the choice isn't "Accept a bug fix or > not", it's "Will accepting this bug fix generally improve or reduce > quality of the release" -- i.e., will the change fix the bug it is > claimed to fix, and will it result in lots of latent bugs suddenly > becoming visible. > > Particular with hardware drivers like nve, this is non-trivial, because > the behavior of the hardware is very subtle, there's lots of variety in > the shipped hardware, and the vendor is (or appears) highly > unsupportive. The result is that if you tweak a register or minor piece > of behavior, it dramatically improve support for a particular piece of > hardware, but break all the rest. The only way to mitigate this risk is > through extensive testing, and extensive testing takes a lot of time. > And by a lot of time, I mean, a long release cycle. So if we want to > adopt a fix that is high risk -- i.e., is believed will interact in > subtle ways that affect different machines differently -- we need to > make the change early in the release cycle, not at the end. If we make > it at the end, we are shipping code that is effectively untested on a > large number of systems. Sure, it will fix one, but if it breaks the > rest, is it worth it? The only alternative is to restart the testing > process, which in the case of high-risk drivers, means adding months to > the release cycle. > > And you can see where this is leading: if you significantly delay the > release cycle for each minor bug, you will never release. At some > point, you have to make the decision "although this release isn't > perfect, we'll never release if we don't ship now". I know that sounds > like a bad thing, but you'll find that that practice is not only found > in every part of the software industry, but it's also impossible to > avoid, since bug-free software is impossible to avoid. > > When you look at the RC2 release notes Scott recently sent, he > identifies four bugs that he believes won't be fixed in time for the > release. He decided that this was the case using risk management: each > bug actually likely represents several bugs with the same features, in > highly complex code. This means that they will take a significant > amount of time to fix, and that each fix is high risk, as it is likely > to reveal latent bugs. This means that each fix will require a lot of > testing -- months of testing, in fact. So the choice is really, do we > release 6.1, or do we skip it and do a 6.2 in a few months. As the > release engineer, Scott has concluded that releasing now offers a great > benefit to many people, although the bugs present may penalize some. > Mind you, in some cases the bugs also exist in 6.0, so they don't > represent regressions, so much as bugs that continue to persist. I > agree with his conclusion: things like locking interactions in VFS are > incredibly complicated, requiring extensive analysis and work to fix and > test. Trying to fix them for 6.1 is unrealistic. They can be fixed in > the next few weeks, tested for a month or two, and then merged to the > RELENG_6_1 branch as errata fixes, similar to security advisories. > > It's all about trade-offs. People are welcome to (and frequently do) > disagree with our analysis and choice on the trade-offs, but what I'm > trying to emphasize in this e-mail is that these trade-offs are a > reality. They can't be ignored: bug-free releases of software can't be > shipped because they don't exist, and therefore the argument (decision) > is always about where the right balance is. Arguing for waiting to ship > until every last bug is fixed is arguing never to release software -- > bugs are present in all software, and not all latent either -- that's > why products have errata notes (as does FreeBSD), patch levels, etc. > Don't believe this means we don't think fixing bugs is important, and > that we don't spend long days and nights (and more days and more nights) > working on it. > > FWIW, if you look at the release process of any other commercial or open > source software product, you'll see the same thing. Either there's no > bug database, or there's a very large database. If there's no database, > it's because the developer isn't being honest about there being bugs, or > they have no testing. If there's a huge database, they are, and they're > not all going to get shipped. Software authors select bugs to fix based > on the impact of the bugs and their ability to fix them. I'd like to > think we care more than some, but caring isn't enough to make computer > software development perfect, or it would have happened a long time ago > :-). > > Thanks, > > Robert N M Watson thank you! very nice!