From owner-freebsd-stable@FreeBSD.ORG Wed Jun 19 13:15:11 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 5A9D0406 for ; Wed, 19 Jun 2013 13:15:11 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) by mx1.freebsd.org (Postfix) with ESMTP id E84EA10C3 for ; Wed, 19 Jun 2013 13:15:09 +0000 (UTC) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1UpHyi-0003gO-PJ; Wed, 19 Jun 2013 14:59:14 +0200 Received: from [81.21.138.17] (helo=ronaldradial.versatec.local) by smtp.greenhost.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from ) id 1UpHyi-0006RP-LX; Wed, 19 Jun 2013 14:59:12 +0200 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-stable@freebsd.org, "Adam Strohl" Subject: Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount References: <51C1979D.3010305@ateamsystems.com> <20130619122143.GA70813@icarus.home.lan> <51C1A9BF.8030304@ateamsystems.com> Date: Wed, 19 Jun 2013 14:59:10 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <51C1A9BF.8030304@ateamsystems.com> User-Agent: Opera Mail/12.15 (Win32) X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.1 X-Scan-Signature: f0e1c9854a9562d49e95611d158cd3d0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jun 2013 13:15:11 -0000 On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl wrote: > On 6/19/2013 19:21, Jeremy Chadwick wrote: >> On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote: >>> Hello -STABLE@, >>> >>> So I've seen this situation seemingly randomly on a number of both >>> physical 9.1 boxes as well as VMs for I would say 6-9 months at >>> least. I finally have a physical box here that reproduces it >>> consistently that I can reboot easily (ie; not a production/client >>> server). Hi, My home computer had the same symptom (not rebooting after 'all buffers flushed' message) a couple of months ago. But I follow 9-STABLE and the problem is gone for a while now. Ronald. >>> >>> No matter what I do: >>> >>> reboot >>> shutdown -p >>> shutdown -r >>> >>> This specific server will stop at "All buffers synced" and not >>> actually power down or reboot. KB input seems to be ignored. This >>> server is a ZFS NAS (with GMIRROR for boot blocks) but the other >>> boxes which show this are using GMIRRORs for root/swap/boot (no >>> ZFS). >>> >>> Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg >>> >>> When I reset the server it appears that disks were not dismounted >>> cleanly ... on this ZFS box it comes back quick because ZFS is good >>> like that but on the other servers with GMIRROR roots rebuilding the >>> GMIRROR and fscking at the same time is murder on the >>> disk/performance until it finishes. >> >> 1. You mention "as well as VMs". Anything under a "virtual machine" or >> under a hypervisor is going to be very, very, **VERY** different than >> bare metal. So I hope the issues you're talking about above are on bare >> metal -- I will assume so. > > Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor > (and yes it worries me the implications of something so broad). Those > unites I just haven't been able to isolate on a server which isn't > critical. Lets focus on this server for now though per your suggestion > below. > >> >> 2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE. >> If you use stable/9 (RELENG_9) we need to see uname -a output (you can >> hide the machine name if you want). > > Sorry, this ZFS box is 9.1-R P4 (kernel built today): > > FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 > 15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS > amd64 > >> >> 3. Can we please have dmesg from this machine? The controller and some >> other hardware details matter. > > Sure take a look at the full log here: http://pastebin.com/k55gVVuU > > This includes a boot, then a reboot as I describe (you can see it logs > the All Buffers Synced, etc) then powering back on. > >> >> 4. Does "sysctl hw.usb.no_shutdown_wait=1" help you? > > Weirdly this allowed it to reboot on the first try (without needing to > be reset), but not the second. The "Starting background file system > checks in 60 seconds" message appeared ... that only happens when > something is dirty, right? > > So the second try with just this I could ctrl alt del it and it > responded .. kind of: > http://i.imgur.com/POAIaNg.jpg > > Still had to reset it though. > >> >> 5. Does "sysctl hw.acpi.handle_reboot=1" help you? > > No change, still responded to a ctrl alt del like above, but like that > still needs to be reset and comes back dirty. > >> >> 6. Does "sysctl hw.acpi.disable_on_reboot=1" help you? > > No change. Same as above, ctrl alt del responds but needs a hard reset > still. > >> >> 7. If none of the above helps, can you please boot verbose mode and then >> when the system "locks up" on "shutdown -r now" take a picture of the >> VGA console? > > Lots of debug on boot obviously but not much different on shutdown/hang: > http://i.imgur.com/SgzSsoP.jpg > >> >> 8. Does the machine run moused(8) (check the process list please, do not >> rely on rc.conf) ? > > ps -auxww | grep moused reveals nothing running (which is how I have > things set). > >> >>> Another interesting thing is that this particular server runs slapd >>> (OpenLDAP) which, when it comes back up, has a "corrupted" DB >>> (easily fixed with db_recover, but still). This might be because FS >>> commits aren't happening at the end. I can even manually stop >>> slapd (service slapd stop) then run sync(8) (I assume this does >>> something for ZFS too) and it still comes back as hosed if I reboot >>> shortly after. If I start/stop slapd it's fine. So I feel like >>> there is an FS/dismount thing going on here. >> >> sync(8) does not do what you think it does. Please read (not skim) this >> entire thread starting here: >> >> http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982 >> http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html > > Groking this now .. > >> >> Your problem is related to unclean shutdown; fix that and your issues go >> away. > > Yeah that is my feeling as well. > >> >>> Additional information: I also have some boxes which will reboot >>> (ie; they don't freeze like some do at the end) but they don't >>> dismount cleanly either and have to rebuild both GMIRROR and fsck. >>> This might be a different issue, too. >> >> Every issue needs to be handled/treated separately. > > Sure, I just had run across some threads about that but will focus on > this ZFS box (and see if anything that fixes here does anything with > that once I can reliably reproduce it out of production). > >> >