From owner-freebsd-stable@FreeBSD.ORG Tue Oct 10 17:20:42 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D37F16A416; Tue, 10 Oct 2006 17:20:42 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from mail.ambrisko.com (mail.ambrisko.com [64.174.51.43]) by mx1.FreeBSD.org (Postfix) with ESMTP id AAAD643D6E; Tue, 10 Oct 2006 17:20:39 +0000 (GMT) (envelope-from ambrisko@ambrisko.com) Received: from server2.ambrisko.com (HELO www.ambrisko.com) ([192.168.1.2]) by mail.ambrisko.com with ESMTP; 10 Oct 2006 10:17:40 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by www.ambrisko.com (8.13.1/8.12.11) with ESMTP id k9AHKdA7099669; Tue, 10 Oct 2006 10:20:39 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.13.1/8.13.1/Submit) id k9AHKdMI099668; Tue, 10 Oct 2006 10:20:39 -0700 (PDT) (envelope-from ambrisko) From: Doug Ambrisko Message-Id: <200610101720.k9AHKdMI099668@ambrisko.com> In-Reply-To: <200610101022.33761.jhb@freebsd.org> To: John Baldwin Date: Tue, 10 Oct 2006 10:20:39 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL94b (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Cc: stable@freebsd.org, Bruno Ducrot , freebsd-stable@freebsd.org, Bill Moran Subject: Re: Dell 1950 does not properly respond to reboot and shutdown -p X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Oct 2006 17:20:42 -0000 John Baldwin writes: | On Tuesday 10 October 2006 08:54, Bill Moran wrote: | > In response to Doug Ambrisko : | > > Bruno Ducrot writes: | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote: | > > | > In response to Bruno Ducrot : | > > | > > Hi, | > > | > > | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote: | > > | > > > | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the | > > | > > > shutdown screen. | > > | > > > | > > | > > > A shutdown -p does the same. | > > | > > | > > | > > What exactly are the last few lines? | > > | > | > > | > (manually copied) | > > | > | > > | > ... | > > | > All buffers synced. | > > | > Uptime: 1m16s | > > | > | > > | | > > | Thanks. Then this happen after print_uptime(). | > > | | > > | I believe one of the drivers register a shutdown_final (or | > > | shutdown_post_sync) event that hang your system. I think (though I | > > | may be wrong) mfi may be that one. | > > | | > > | It would help if you can add some printf in dev/mfi/mfi.c into the | > > | mfi_shutdown() function in order to check if that assumption | > > | is correct. | > > | > > Some what related to this we have a local hack: | > > | > > --- sys/kern/subr_bus.c.orig Tue Jun 27 15:49:39 2006 | > > +++ sys/kern/subr_bus.c Tue Jun 27 15:49:51 2006 | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev) | > > device_t child; | > > | > > TAILQ_FOREACH(child, &dev->children, link) { | > > + DELAY(1000); | > > device_shutdown(child); | > > } | > | > This patch seems to "fix" the problem. I'm going to replace it with | > some printfs and see if I can determine which driver is actually | > causing the problem (hopefully it's only one). | > | > Am I wrong in saying that the correct solution would be to identify the | > driver that needs more time and implementing some sort of polling | > mechanism to ensure the hardware is ready when the driver wants to | > shut down? | | Well, first let's see which driver it is. :) You might be able to just | remove the DELAY and add a printf and see which device is printed last. I think it was in a different ones. One of our configs has the base HW + bge NIC the other has base HW + 2 x 2 port em NICs. The more NIC's the better chance for a problem. I've removed the hack from our kernel and I'm going to run the reboot cycle. I don't think a printf will work since I recall trying that it "fixed" the problem so I put the DELAY in :-( It could be generic problem to the system with a sufficiently fast CPU to beat the HW at shutting down. I'm not sure if his system is Dempsey or Woodcrest. We use Woodcrest and they are really faster. Other machines might be "slow" enough that it's not a a problem! We haven't seen it on our older platforms with the same kernel and similar HW configs. Doug A.