From owner-freebsd-stable@FreeBSD.ORG  Wed Jun 19 13:15:11 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 5A9D0406
 for <freebsd-stable@freebsd.org>; Wed, 19 Jun 2013 13:15:11 +0000 (UTC)
 (envelope-from ronald-freebsd8@klop.yi.org)
Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl
 [195.190.28.81]) by mx1.freebsd.org (Postfix) with ESMTP id E84EA10C3
 for <freebsd-stable@freebsd.org>; Wed, 19 Jun 2013 13:15:09 +0000 (UTC)
Received: from smtp.greenhost.nl ([213.108.104.138])
 by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.69) (envelope-from <ronald-freebsd8@klop.yi.org>)
 id 1UpHyi-0003gO-PJ; Wed, 19 Jun 2013 14:59:14 +0200
Received: from [81.21.138.17] (helo=ronaldradial.versatec.local)
 by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.72) (envelope-from <ronald-freebsd8@klop.yi.org>)
 id 1UpHyi-0006RP-LX; Wed, 19 Jun 2013 14:59:12 +0200
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
To: freebsd-stable@freebsd.org, "Adam Strohl" <adams-freebsd@ateamsystems.com>
Subject: Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly
 dismount
References: <51C1979D.3010305@ateamsystems.com>
 <20130619122143.GA70813@icarus.home.lan> <51C1A9BF.8030304@ateamsystems.com>
Date: Wed, 19 Jun 2013 14:59:10 +0200
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
From: "Ronald Klop" <ronald-freebsd8@klop.yi.org>
Message-ID: <op.wyxfow1r8527sy@ronaldradial.versatec.local>
In-Reply-To: <51C1A9BF.8030304@ateamsystems.com>
User-Agent: Opera Mail/12.15 (Win32)
X-Virus-Scanned: by clamav at smarthost1.samage.net
X-Spam-Level: /
X-Spam-Score: 0.8
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.1
X-Scan-Signature: f0e1c9854a9562d49e95611d158cd3d0
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Jun 2013 13:15:11 -0000

On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl  
<adams-freebsd@ateamsystems.com> wrote:

> On 6/19/2013 19:21, Jeremy Chadwick wrote:
>> On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:
>>> Hello -STABLE@,
>>>
>>> So I've seen this situation seemingly randomly on a number of both
>>> physical 9.1 boxes as well as VMs for I would say 6-9 months at
>>> least.  I finally have a physical box here that reproduces it
>>> consistently that I can reboot easily (ie; not a production/client
>>> server).

Hi,

My home computer had the same symptom (not rebooting after 'all buffers  
flushed' message) a couple of months ago. But I follow 9-STABLE and the  
problem is gone for a while now.

Ronald.

>>>
>>> No matter what I do:
>>>
>>> reboot
>>> shutdown -p
>>> shutdown -r
>>>
>>> This specific server will stop at "All buffers synced" and not
>>> actually power down or reboot.  KB input seems to be ignored.  This
>>> server is a ZFS NAS (with GMIRROR for boot blocks) but the other
>>> boxes which show this are using GMIRRORs for root/swap/boot (no
>>> ZFS).
>>>
>>> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
>>>
>>> When I reset the server it appears that disks were not dismounted
>>> cleanly ... on this ZFS box it comes back quick because ZFS is good
>>> like that but on the other servers with GMIRROR roots rebuilding the
>>> GMIRROR and fscking at the same time is murder on the
>>> disk/performance until it finishes.
>>
>> 1. You mention "as well as VMs".  Anything under a "virtual machine" or
>> under a hypervisor is going to be very, very, **VERY** different than
>> bare metal.  So I hope the issues you're talking about above are on bare
>> metal -- I will assume so.
>
> Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor  
> (and yes it worries me the implications of something so broad).  Those  
> unites I just haven't been able to isolate on a server which isn't  
> critical.  Lets focus on this server for now though per your suggestion  
> below.
>
>>
>> 2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE.
>> If you use stable/9 (RELENG_9) we need to see uname -a output (you can
>> hide the machine name if you want).
>
> Sorry, this ZFS box is 9.1-R P4 (kernel built today):
>
> FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19  
> 15:31:12 ICT 2013     root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS   
> amd64
>
>>
>> 3. Can we please have dmesg from this machine?  The controller and some
>> other hardware details matter.
>
> Sure take a look at the full log here: http://pastebin.com/k55gVVuU
>
> This includes a boot, then a reboot as I describe (you can see it logs  
> the All Buffers Synced, etc) then powering back on.
>
>>
>> 4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
>
> Weirdly this allowed it to reboot on the first try (without needing to  
> be reset), but not the second.  The "Starting background file system  
> checks in 60 seconds" message appeared ... that only happens when  
> something is dirty, right?
>
> So the second try with just this I could ctrl alt del it and it  
> responded .. kind of:
> http://i.imgur.com/POAIaNg.jpg
>
> Still had to reset it though.
>
>>
>> 5. Does "sysctl hw.acpi.handle_reboot=1" help you?
>
> No change, still responded to a ctrl alt del like above, but like that  
> still needs to be reset and comes back dirty.
>
>>
>> 6. Does "sysctl hw.acpi.disable_on_reboot=1" help you?
>
> No change.  Same as above, ctrl alt del responds but needs a hard reset  
> still.
>
>>
>> 7. If none of the above helps, can you please boot verbose mode and then
>> when the system "locks up" on "shutdown -r now" take a picture of the
>> VGA console?
>
> Lots of debug on boot obviously but not much different on shutdown/hang:
> http://i.imgur.com/SgzSsoP.jpg
>
>>
>> 8. Does the machine run moused(8) (check the process list please, do not
>> rely on rc.conf) ?
>
> ps -auxww | grep moused reveals nothing running (which is how I have  
> things set).
>
>>
>>> Another interesting thing is that this particular server runs slapd
>>> (OpenLDAP) which, when it comes back up, has a "corrupted" DB
>>> (easily fixed with db_recover, but still).  This might be because FS
>>> commits aren't happening at the end.   I can even manually stop
>>> slapd (service slapd stop) then run sync(8) (I assume this does
>>> something for ZFS too) and it still comes back as hosed if I reboot
>>> shortly after.  If I start/stop slapd it's fine.  So I feel like
>>> there is an FS/dismount thing going on here.
>>
>> sync(8) does not do what you think it does.  Please read (not skim) this
>> entire thread starting here:
>>
>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html
>
> Groking this now ..
>
>>
>> Your problem is related to unclean shutdown; fix that and your issues go
>> away.
>
> Yeah that is my feeling as well.
>
>>
>>> Additional information: I also have some boxes which will reboot
>>> (ie; they don't freeze like some do at the end) but they don't
>>> dismount cleanly either and have to rebuild both GMIRROR and fsck.
>>> This might be a different issue, too.
>>
>> Every issue needs to be handled/treated separately.
>
> Sure, I just had run across some threads about that but will focus on  
> this ZFS box (and see if anything that fixes here does anything with  
> that once I can reliably reproduce it out of production).
>
>>
>