Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jun 2013 14:59:10 +0200
From:      "Ronald Klop" <ronald-freebsd8@klop.yi.org>
To:        freebsd-stable@freebsd.org, "Adam Strohl" <adams-freebsd@ateamsystems.com>
Subject:   Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
Message-ID:  <op.wyxfow1r8527sy@ronaldradial.versatec.local>
In-Reply-To: <51C1A9BF.8030304@ateamsystems.com>
References:  <51C1979D.3010305@ateamsystems.com> <20130619122143.GA70813@icarus.home.lan> <51C1A9BF.8030304@ateamsystems.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl  
<adams-freebsd@ateamsystems.com> wrote:

> On 6/19/2013 19:21, Jeremy Chadwick wrote:
>> On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:
>>> Hello -STABLE@,
>>>
>>> So I've seen this situation seemingly randomly on a number of both
>>> physical 9.1 boxes as well as VMs for I would say 6-9 months at
>>> least.  I finally have a physical box here that reproduces it
>>> consistently that I can reboot easily (ie; not a production/client
>>> server).

Hi,

My home computer had the same symptom (not rebooting after 'all buffers  
flushed' message) a couple of months ago. But I follow 9-STABLE and the  
problem is gone for a while now.

Ronald.

>>>
>>> No matter what I do:
>>>
>>> reboot
>>> shutdown -p
>>> shutdown -r
>>>
>>> This specific server will stop at "All buffers synced" and not
>>> actually power down or reboot.  KB input seems to be ignored.  This
>>> server is a ZFS NAS (with GMIRROR for boot blocks) but the other
>>> boxes which show this are using GMIRRORs for root/swap/boot (no
>>> ZFS).
>>>
>>> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
>>>
>>> When I reset the server it appears that disks were not dismounted
>>> cleanly ... on this ZFS box it comes back quick because ZFS is good
>>> like that but on the other servers with GMIRROR roots rebuilding the
>>> GMIRROR and fscking at the same time is murder on the
>>> disk/performance until it finishes.
>>
>> 1. You mention "as well as VMs".  Anything under a "virtual machine" or
>> under a hypervisor is going to be very, very, **VERY** different than
>> bare metal.  So I hope the issues you're talking about above are on bare
>> metal -- I will assume so.
>
> Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor  
> (and yes it worries me the implications of something so broad).  Those  
> unites I just haven't been able to isolate on a server which isn't  
> critical.  Lets focus on this server for now though per your suggestion  
> below.
>
>>
>> 2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE.
>> If you use stable/9 (RELENG_9) we need to see uname -a output (you can
>> hide the machine name if you want).
>
> Sorry, this ZFS box is 9.1-R P4 (kernel built today):
>
> FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19  
> 15:31:12 ICT 2013     root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS   
> amd64
>
>>
>> 3. Can we please have dmesg from this machine?  The controller and some
>> other hardware details matter.
>
> Sure take a look at the full log here: http://pastebin.com/k55gVVuU
>
> This includes a boot, then a reboot as I describe (you can see it logs  
> the All Buffers Synced, etc) then powering back on.
>
>>
>> 4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
>
> Weirdly this allowed it to reboot on the first try (without needing to  
> be reset), but not the second.  The "Starting background file system  
> checks in 60 seconds" message appeared ... that only happens when  
> something is dirty, right?
>
> So the second try with just this I could ctrl alt del it and it  
> responded .. kind of:
> http://i.imgur.com/POAIaNg.jpg
>
> Still had to reset it though.
>
>>
>> 5. Does "sysctl hw.acpi.handle_reboot=1" help you?
>
> No change, still responded to a ctrl alt del like above, but like that  
> still needs to be reset and comes back dirty.
>
>>
>> 6. Does "sysctl hw.acpi.disable_on_reboot=1" help you?
>
> No change.  Same as above, ctrl alt del responds but needs a hard reset  
> still.
>
>>
>> 7. If none of the above helps, can you please boot verbose mode and then
>> when the system "locks up" on "shutdown -r now" take a picture of the
>> VGA console?
>
> Lots of debug on boot obviously but not much different on shutdown/hang:
> http://i.imgur.com/SgzSsoP.jpg
>
>>
>> 8. Does the machine run moused(8) (check the process list please, do not
>> rely on rc.conf) ?
>
> ps -auxww | grep moused reveals nothing running (which is how I have  
> things set).
>
>>
>>> Another interesting thing is that this particular server runs slapd
>>> (OpenLDAP) which, when it comes back up, has a "corrupted" DB
>>> (easily fixed with db_recover, but still).  This might be because FS
>>> commits aren't happening at the end.   I can even manually stop
>>> slapd (service slapd stop) then run sync(8) (I assume this does
>>> something for ZFS too) and it still comes back as hosed if I reboot
>>> shortly after.  If I start/stop slapd it's fine.  So I feel like
>>> there is an FS/dismount thing going on here.
>>
>> sync(8) does not do what you think it does.  Please read (not skim) this
>> entire thread starting here:
>>
>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html
>
> Groking this now ..
>
>>
>> Your problem is related to unclean shutdown; fix that and your issues go
>> away.
>
> Yeah that is my feeling as well.
>
>>
>>> Additional information: I also have some boxes which will reboot
>>> (ie; they don't freeze like some do at the end) but they don't
>>> dismount cleanly either and have to rebuild both GMIRROR and fsck.
>>> This might be a different issue, too.
>>
>> Every issue needs to be handled/treated separately.
>
> Sure, I just had run across some threads about that but will focus on  
> this ZFS box (and see if anything that fixes here does anything with  
> that once I can reliably reproduce it out of production).
>
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?op.wyxfow1r8527sy>