Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Oct 2006 15:44:43 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        stable@freebsd.org, Bruno Ducrot <ducrot@poupinou.org>, Bill Moran <wmoran@collaborativefusion.com>
Subject:   Re: Dell 1950 does not properly respond to reboot and shutdown -p
Message-ID:  <200610101544.43903.jhb@freebsd.org>
In-Reply-To: <200610101720.k9AHKdMI099668@ambrisko.com>
References:  <200610101720.k9AHKdMI099668@ambrisko.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 10 October 2006 13:20, Doug Ambrisko wrote:
> John Baldwin writes:
> | On Tuesday 10 October 2006 08:54, Bill Moran wrote:
> | > In response to Doug Ambrisko <ambrisko@ambrisko.com>:
> | > > Bruno Ducrot writes:
> | > > | On Wed, Oct 04, 2006 at 02:07:12PM -0400, Bill Moran wrote:
> | > > | > In response to Bruno Ducrot <ducrot@poupinou.org>:
> | > > | > > Hi,
> | > > | > > 
> | > > | > > On Wed, Oct 04, 2006 at 12:28:35PM -0400, Bill Moran wrote:
> | > > | > > > 
> | > > | > > > A reboot causes the OS to halt, but the hardware just sits there on the
> | > > | > > > shutdown screen.
> | > > | > > > 
> | > > | > > > A shutdown -p does the same.
> | > > | > > 
> | > > | > > What exactly are the last few lines?
> | > > | > 
> | > > | > (manually copied)
> | > > | > 
> | > > | > ...
> | > > | > All buffers synced.
> | > > | > Uptime: 1m16s
> | > > | > 
> | > > | 
> | > > | Thanks.  Then this happen after print_uptime().
> | > > | 
> | > > | I believe one of the drivers register a shutdown_final (or
> | > > | shutdown_post_sync) event that hang your system.  I think (though I
> | > > | may be wrong) mfi may be that one.
> | > > | 
> | > > | It would help if you can add some printf in dev/mfi/mfi.c into the
> | > > | mfi_shutdown() function in order to check if that assumption
> | > > | is correct.
> | > > 
> | > > Some what related to this we have a local hack:
> | > > 
> | > > --- sys/kern/subr_bus.c.orig	Tue Jun 27 15:49:39 2006
> | > > +++ sys/kern/subr_bus.c	Tue Jun 27 15:49:51 2006
> | > > @@ -2906,6 +2906,7 @@ bus_generic_shutdown(device_t dev)
> | > >  	device_t child;
> | > >  
> | > >  	TAILQ_FOREACH(child, &dev->children, link) {
> | > > +		DELAY(1000);
> | > >  		device_shutdown(child);
> | > >  	}
> | > 
> | > This patch seems to "fix" the problem.  I'm going to replace it with
> | > some printfs and see if I can determine which driver is actually
> | > causing the problem (hopefully it's only one).
> | > 
> | > Am I wrong in saying that the correct solution would be to identify the
> | > driver that needs more time and implementing some sort of polling
> | > mechanism to ensure the hardware is ready when the driver wants to
> | > shut down?
> | 
> | Well, first let's see which driver it is. :)  You might be able to just
> | remove the DELAY and add a printf and see which device is printed last.
> 
> I think it was in a different ones.  One of our configs has the base
> HW + bge NIC the other has base HW + 2 x 2 port em NICs.  The more
> NIC's the better chance for a problem.
> 
> I've removed the hack from our kernel and I'm going to run the reboot
> cycle.  I don't think a printf will work since I recall trying that
> it "fixed" the problem so I put the DELAY in :-(  It could be generic
> problem to the system with a sufficiently fast CPU to beat the
> HW at shutting down.  I'm not sure if his system is Dempsey or Woodcrest.
> We use Woodcrest and they are really faster.  Other machines might be 
> "slow" enough that it's not a a problem!  We haven't seen it on our older 
> platforms with the same kernel and similar HW configs.

Can you break into the debugger when it is broken?  If so, then change the
printf to a KTR trace and enable just that KTR trace and do 'show ktr' in
ddb to see which devices were shutdown.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200610101544.43903.jhb>