From owner-freebsd-stable@FreeBSD.ORG Thu Mar 31 05:30:24 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6616616A4CE for ; Thu, 31 Mar 2005 05:30:24 +0000 (GMT) Received: from FS.denninger.net (wsip-68-15-213-52.at.at.cox.net [68.15.213.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF52043D49 for ; Thu, 31 Mar 2005 05:30:23 +0000 (GMT) (envelope-from karl@FS.denninger.net) Received: from fs.denninger.net (localhost [127.0.0.1]) by FS.denninger.net (8.13.3/8.13.1) with SMTP id j2V5UN1D073873 for ; Wed, 30 Mar 2005 23:30:23 -0600 (CST) (envelope-from karl@FS.denninger.net) Received: from fs.denninger.net [127.0.0.1] by Spamblock-sys; Wed Mar 30 23:30:23 2005 Received: (from karl@localhost) by FS.denninger.net (8.13.3/8.13.1/Submit) id j2V5UIlD073871; Wed, 30 Mar 2005 23:30:18 -0600 (CST) (envelope-from karl) Message-ID: <20050330233018.B68235@denninger.net> Date: Wed, 30 Mar 2005 23:30:18 -0600 From: Karl Denninger To: Drew Tomlinson References: <20050329200841.A772@denninger.net> <20050329233843.L328@sasami.jurai.net> <20050329230830.A3222@denninger.net> <20050329234318.A3883@denninger.net> <44027.128.222.32.10.1112202442.squirrel@mail.scadian.net> <20050330113931.A39018@denninger.net> <424AF396.6010909@mykitchentable.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: <424AF396.6010909@mykitchentable.net>; from Drew Tomlinson on Wed, Mar 30, 2005 at 10:44:38AM -0800 Organization: Karl's Sushi and Packet Smashers X-Die-Spammers: Spammers cheerfully broiled for supper and served with ketchup! cc: freebsd-stable@freebsd.org Subject: Re: DANGER WILL ROBINSON! SERIOUS problem with current 5.4-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2005 05:30:24 -0000 On Wed, Mar 30, 2005 at 10:44:38AM -0800, Drew Tomlinson wrote: > > > I missed the beginning of this thread and apologize if my question has > already been covered. But can you tell me if this issue might be the > reason my PC locks up intermittently ? I have whatever cheap card came > with a Maxtor 160 GB SATA drive installed in this machine and the PC ran > fine with Windows. Now I'm trying install FBSD from the 5.4-BETA ISO I > downloaded from the ftp site. The PC runs POST fine and always boots > from the CD to the boot menu. After picking the default option 1 > (normal boot) the PC locks up anywhere from the dmesg output to > sysinstall actually beginning to install the base package after doing > the fdisk and disklabel stuff. Should I download 5.3-RELEASE and try > installing from that? > > Thanks, > > Drew 5.3-RELEASE may lock up too, but in different ways. In a non-redundant disk situation a bogus fatal write error hoses you in extremely bad ways, including possible file or filesystem metadata damage. I would NOT run 5.3 in an attempt to get around this, in that such damage could remain "hidden" (although not without notice, as the errors will show up on the console!) for quite some time until you discover "holes" in your files or a critical metadata write craps out and causes a crash - possibly with a corrupted disk that fsck can't fix. Grave danger (to your data) lies down that road.... 5.4-PRERELEASE, once the tests are complete (that I'm working on now), the decisions on what to commit are made, and a new ISO is cut, should work - it will bitch (a LOT) about retried writes, but it should work. At least that's what I'm seeing right now - I can provoke the error, but it doesn't kill the machine anymore and it also doesn't appear to corrupt data as the retired write is (by all appearances) successful. It'll be a couple of days before I can be SURE that what appears to be working right now is in fact stable though, then however long it takes for the back room stuff to get done and new ISOs generated. BTW its NOT your hardware at fault here - the same hardware that returns these complaints for me on 5.x works perfectly with 4.11. There have been changes made to the ATA code that apparently interact VERY badly with some controllers - particularly some very common SATA (SII chipset, used on Adaptec and Bustek boards, among others) ones. I don't know if GEOM/GMIRROR is truly involved here although that's the easiest way for me to provoke it - I suspect not - its just that GEOM/GMIRROR produces an I/O load pattern that is conducive to the breakage showing up. Specifically, a "DD" from one or more disks does NOT fail - a mix of reads and writes and fairly significant load appears necessary to cause trouble. Of course installation produces a very nice load of that type.... I opened a PR on this quite some time ago - IMHO this sort of breakage should be considered a critical fault sufficient to stop a release until its completely resolved. A workaround that stops the system from blowing up but leaves the pauses and errors isn't really a fix - I doubt anyone will consider that acceptable as a means of truly addressing the problem (at least I hope not!) I got "surprised" by this (in a bad way) and have been fighting workarounds since 5.3 was deemed "production" quality. Going back to 4.x is possible for me, but highly undesireable for a number of reasons, not the least of which is the official FreeBSD posture on where work is and will be done on the OS down the road. The Intel ICH-based SATA adapters appear NOT to have this problem. I've beat the living SNOT out of my two systems with ICH-based motherboard SATA controllers on them for days at a time and have been unable to provoke the problem - using the same disk drives. The SII-based chipset boards I have (one Adaptec and one Bustek) reliably puke within seconds with a simple large-directory copy. Both ran for a VERY long time under 4.x and were completely stable. Unfortunately I've yet to find an actual with the ICH chipset on it - it is common among motherboard SATA controllers, but that doesn't help people who need the adapter on a PCI card. ATA-GenIII may fix all this but I've yet to try it. In any event that's a research project right now, although it will likely soon get committed to -HEAD. That still doesn't help you though in that it won't show up in -STABLE until people are satisfied that it at worst is at least as good as what's in there now..... -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://www.spamcuda.net SPAM FREE mailboxes - FREE FOR A LIMITED TIME! http://genesis3.blogspot.com Musings Of A Sentient Mind