From owner-freebsd-fs@FreeBSD.ORG Sun Mar 24 18:54:17 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 73894ED9 for ; Sun, 24 Mar 2013 18:54:17 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay01.pair.com (relay01.pair.com [209.68.5.15]) by mx1.freebsd.org (Postfix) with SMTP id 30918ED0 for ; Sun, 24 Mar 2013 18:54:16 +0000 (UTC) Received: (qmail 90458 invoked by uid 0); 24 Mar 2013 18:54:15 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay01.pair.com with SMTP; 24 Mar 2013 18:54:15 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <514F4BD6.1060807@sneakertech.com> Date: Sun, 24 Mar 2013 14:54:14 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Jeremy Chadwick Subject: Re: ZFS: Failed pool causes system to hang References: <20130321044557.GA15977@icarus.home.lan> <514AA192.2090006@sneakertech.com> <20130321085304.GB16997@icarus.home.lan> <20130324153342.GA3687@icarus.home.lan> <20130324155448.GA4122@icarus.home.lan> In-Reply-To: <20130324155448.GA4122@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Mar 2013 18:54:17 -0000 >> However, commands like "zpool status" > > ...and seems a typo I made in vim caused the rest of my sentence to get > deleted before I sent it out. This should have read: > >> However, commands like "zpool status" work just fine, but things like >> "zpool destroy" and so on indefinitely block ("mount drain"), which to >> me makes some degree of sense. I'll have to double check this. I *know* I've run status and had it hang, but I'm not 100% certain if I've done it fast enough to guarantee that something else didn't hit the pool first. > Yes, you will need to reboot for the ZFS layer to effectively "un-wedge" > itself from whatever catatonic state its in. No argument: this is a bug > somewhere, and my guess is that it relates to the confused state of the > devices in CAM-land. But regardless, I think if you were to lose 3 of 4 > disks on a raidz2 pool you'd have much more serious things to be worried > about than "well crap I have to issue a reboot". My concern is proper investigation and damage control. The "it stopped working, guess I should reboot" is the windows way of administration. In the case of serious hardware failure, rebooting or otherwise continuing to provide power to the affected devices can be a very BAD thing. I'd like to have some idea of what the heck happened before I blindly powercycle something. > And yes, I did test a reboot in the scenario I described -- the system > did reboot without physically pressing the button. It *never* does for me. Ever. > People who run servers remotely yet lack this capability are > intentionally choosing [snip] Before you get up on a high horse and preach at me, consider a couple things: 1) Yes I can set that up, but this is a test box on my desk right now. 2) A hard reset is a hard reset is a hard reset. I'm not bitching that I have to physically walk over to the machine, I'm bitching that *THAT I HAVE TO RESET IT*. Being able to reset it remotely is NOT an acceptable solution or workaround, and has no bearing on my problem. ______________________________________ it has a certain smooth-brained appeal