From owner-freebsd-stable@FreeBSD.ORG Thu Mar 31 16:02:43 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7CD3F16A4CE for ; Thu, 31 Mar 2005 16:02:43 +0000 (GMT) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.49.29]) by mx1.FreeBSD.org (Postfix) with ESMTP id 924BB43D5C for ; Thu, 31 Mar 2005 16:02:42 +0000 (GMT) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (pool-151-199-87-202.roa.east.verizon.net [151.199.87.202]) by gromit.dlib.vt.edu (8.13.3/8.13.3) with ESMTP id j2VG2bQi013082 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 31 Mar 2005 11:02:38 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: from zappa.Chelsea-Ct.Org (localhost.Chelsea-Ct.Org [127.0.0.1]) by zappa.Chelsea-Ct.Org (8.13.3/8.13.3) with ESMTP id j2VG2QXp001070 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 31 Mar 2005 11:02:27 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) Received: (from paul@localhost) by zappa.Chelsea-Ct.Org (8.13.3/8.13.3/Submit) id j2VG2PkN001069; Thu, 31 Mar 2005 11:02:25 -0500 (EST) (envelope-from paul@gromit.dlib.vt.edu) X-Authentication-Warning: zappa.Chelsea-Ct.Org: paul set sender to paul@gromit.dlib.vt.edu using -f From: Paul Mather To: Karl Denninger In-Reply-To: <20050330233018.B68235@denninger.net> References: <20050329200841.A772@denninger.net> <20050329230830.A3222@denninger.net> <20050329234318.A3883@denninger.net> <44027.128.222.32.10.1112202442.squirrel@mail.scadian.net> <424AF396.6010909@mykitchentable.net> <20050330233018.B68235@denninger.net> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 31 Mar 2005 11:02:25 -0500 Message-Id: <1112284945.1048.3.camel@zappa.Chelsea-Ct.Org> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 FreeBSD GNOME Team Port cc: Drew Tomlinson cc: freebsd-stable@freebsd.org Subject: Re: DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2005 16:02:43 -0000 On Wed, 2005-03-30 at 23:30 -0600, Karl Denninger wrote: > BTW its NOT your hardware at fault here - the same hardware that returns > these complaints for me on 5.x works perfectly with 4.11. There have been > changes made to the ATA code that apparently interact VERY badly with > some controllers - particularly some very common SATA (SII chipset, used > on Adaptec and Bustek boards, among others) ones. It's not just a SATA problem. I get the problem (though more infrequently than it seems you do) on an Intel PIIX4 UDMA33 controller. The problem occurs on two different systems (one Gateway, one Dell), and only started happening some way through the 5.x life cycle, indicating to me that a serious regression was introduced (in 5.2, I believe). The problem does not afflict 4.x. > I don't know if GEOM/GMIRROR is truly involved here although that's the > easiest way for me to provoke it - I suspect not - its just that > GEOM/GMIRROR produces an I/O load pattern that is conducive to the > breakage showing up. Specifically, a "DD" from one or more disks does NOT > fail - a mix of reads and writes and fairly significant load appears > necessary to cause trouble. Of course installation produces a very nice > load of that type.... On both systems that experience the problem, I am using some kind of software mirroring. On one I'm using geom_mirror, and on the other I'm using geom_vinum. Both suffer from the WRITE_DMA disconnect problem. The Dell, using geom_mirror, is now running HEAD. The Gateway running RELENG_5 is annoying because when a drive becomes disconnected, the only way right now to rebuild the plexes on the geom_vinum drive that is down is to reboot the system. (I've used "setstate" to flag the drive as up, but then "gvinum start" of any down plex causes an immediate panic/reboot.) Ian Dowse posted a patch to the freebsd-current mailing list for the WRITE_DMA issue (http://lists.freebsd.org/mailman/htdig/freebsd-current/2005-February/046773.html). According to Dowse, the patch "attempts to clean up the handling of timeouts in the ATA code by using the new callout_init_mtx() function." It was successful for me. I still got the WRITE_DMA timeouts, but not the disconnects. I don't know if RELENG_5 has "the new callout_init_mtx() function." If it does, this patch might help there, too. > I opened a PR on this quite some time ago - IMHO this sort of breakage > should be considered a critical fault sufficient to stop a release until > its completely resolved. A workaround that stops the system from blowing up > but leaves the pauses and errors isn't really a fix - I doubt anyone > will consider that acceptable as a means of truly addressing the problem > (at least I hope not!) I agree that it wouldn't be ideal, but having something that fixed just the disconnects in the tree would be better than nothing at all. It's a pain to have to track third-party patches. > I got "surprised" by this (in a bad way) and have been fighting > workarounds since 5.3 was deemed "production" quality. Going back to > 4.x is possible for me, but highly undesireable for a number of reasons, not > the least of which is the official FreeBSD posture on where work is and will > be done on the OS down the road. It's disappointing the way this problem appears to have been silently ignored (except by those whom it afflicts), because it is a regression that occurred during the 5.x lifecycle. It's one thing to know that your hardware won't work properly going from 4.x to 5.x, but another thing to have it stop working going from one 5.x release to another. (Or maybe it isn't, given the strange "Early Adopter" status of the start of the 5.x release cycle.) Anyway, I'm glad you are trying to keep this problem in the spotlight, because an unreliable ATA subsystem is a miserable thing to have to suffer. :-( Cheers, Paul. -- e-mail: paul@gromit.dlib.vt.edu "Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid." --- Frank Vincent Zappa