From owner-freebsd-stable@FreeBSD.ORG Thu May 21 18:07:46 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 156661065670 for ; Thu, 21 May 2009 18:07:46 +0000 (UTC) (envelope-from joe@freebsd.org) Received: from alpha.tao.org.uk (alpha.tao.org.uk [212.42.1.232]) by mx1.freebsd.org (Postfix) with ESMTP id C64E58FC2D for ; Thu, 21 May 2009 18:07:45 +0000 (UTC) (envelope-from joe@freebsd.org) Received: from localhost (alpha.tao.org.uk [212.42.1.232]) by alpha.tao.org.uk (Postfix) with ESMTP id A2DEA1076E04 for ; Thu, 21 May 2009 18:51:02 +0100 (BST) Received: from alpha.tao.org.uk ([212.42.1.232]) by localhost (mail.tao.org.uk [212.42.1.232]) (amavisd-maia, port 10024) with LMTP id 29817-01 for ; Thu, 21 May 2009 18:51:01 +0100 (BST) Received: from [192.168.1.75] (router.tao.org.uk [78.105.4.78]) (Authenticated sender: joemail@alpha.tao.org.uk) by alpha.tao.org.uk (Postfix) with ESMTPA id 45AEE1076C86 for ; Thu, 21 May 2009 18:51:01 +0100 (BST) Message-ID: <4A159482.9080903@freebsd.org> Date: Thu, 21 May 2009 18:50:58 +0100 From: Joe Karthauser User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1b3pre) Gecko/20090223 Lightning/1.0pre Thunderbird/3.0b2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org References: <3c1674c90905201459k19776d53n309b2abeab0f8d0a@mail.gmail.com> <200905202209.n4KM9Bcg094853@lava.sentex.ca> <3c1674c90905201541n65f997e6jaa20d93bf566fb98@mail.gmail.com> <68BDAD74-021A-4169-B003-21A2BCF2AD5C@transsys.com> <4A156AD7.8000003@icyb.net.ua> In-Reply-To: <4A156AD7.8000003@icyb.net.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: Maia Mailguard 1.0.2a Subject: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 18:07:46 -0000 Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 server, which now doesn't boot. :(. So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks (gmirror on 500Mb partition on each of five disks, and zraid2 over the rest of each drive). What I did was to update the userland, and then reboot. I didn't upgrade the kernel (but I've subsequently done that and have the same problem). What happens is that the kernel hangs booting just after displaying a LABEL message or ZFS pool/spool message. I _can_ get it to boot if I boot single user with acpi switched off. When I do that I can manually start zfs, and mount all the partitions. However, one of the disks is missing.... more on that next. The machine is running a gigabyte motherboard (domestic gamer P35 board, similar to this http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, although it might be a DS4 variant). I've got 5 of the 6 sata ports wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4" bays kind of thing). Now, because of the gmirror I can boot the system on any disk, or combination of plugged in disks. I should be able to succeed with the kernel probe up to the attempt to mount the root filesystem irrespective of any zfs pool, etc. And, indeed, this has been working fine for about two years. But, now it hangs in the same place no matter what disk I boot on (I've tried every bay). But, without ACPI enabled it does appear to boot ok... what's going on here? Is it possible that the machine has developed a hardware fault? Ok, finally, if I boot with ACPI disabled then one of the disks is missing. If I unplug it I get a disconnect message from the ata device, and a reconnect and reinit attempt when I plug it back in, but no device appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol attach sata4' and the device reappears. This happens on the other buses, but not on the last one. It's not the disk, because if I swap it into another bay, it comes up and appears on the bus. On the other hand it doesn't appear to be that controller or slow in the drive bay because if I unplug all the over disks the system will boot that disk and get as far as the hang.... hmm. Is this a consequence of disabling the ACPI? Does anyone have a clue what might be going on? Joe