Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jun 2013 21:15:18 +0700
From:      Adam Strohl <adams-freebsd@ateamsystems.com>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount
Message-ID:  <51C1BCF6.8090606@ateamsystems.com>
In-Reply-To: <20130619133538.GA71689@icarus.home.lan>
References:  <51C1979D.3010305@ateamsystems.com> <20130619122143.GA70813@icarus.home.lan> <51C1A9BF.8030304@ateamsystems.com> <20130619133538.GA71689@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 6/19/2013 20:35, Jeremy Chadwick wrote:

>> Nope, I see basically the same thing sometimes under ESXi 5.0
>> Hypervisor (and yes it worries me the implications of something so
>> broad).  Those unites I just haven't been able to isolate on a
>> server which isn't critical.  Lets focus on this server for now
>> though per your suggestion below.
>
> I'm sorry but I don't understand your first sentence -- the first part
> of your sentence says "nope" (I have to assume in reply to my "on bare
> metal" part), but then says "I see basically the same thing sometimes
> under ESXi" which implies an alternate environment in comparison (i.e.
> we *are* talking about bare metal).  Consider me confused.  :-)

Basically: The issue is extremely similar if not the same root cause, be 
it a native or virtual server.  This server though is native.

>
>>> 2. We need to know what version of "9.1" you're using, i.e. 9.1-RELEASE.
>>> If you use stable/9 (RELENG_9) we need to see uname -a output (you can
>>> hide the machine name if you want).
>>
>> Sorry, this ZFS box is 9.1-R P4 (kernel built today):
>>
>> FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun
>> 19 15:31:12 ICT 2013
>> root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS  amd64
>
> I suggest trying stable/9 (and staying with it, for that matter).

The issue is no binary updates and we have a large deploy base, so we've 
stuck with -R and use it internally because it's what we deploy.

>
>>> 3. Can we please have dmesg from this machine?  The controller and some
>>> other hardware details matter.
>>
>> Sure take a look at the full log here: http://pastebin.com/k55gVVuU
>>
>> This includes a boot, then a reboot as I describe (you can see it
>> logs the All Buffers Synced, etc) then powering back on.
>
> Thanks.  I was mainly interested in the storage controller being used
> (in this case ahci(4)) and the disks being used (notorious ST3000DM001,
> known for excessively parking heads).

Yeah, was not my first choice but then again ... RAIDZ-2 :)  HD supply 
chain here (Thailand) is weird considering how many are made here (and 
can't buy).  Smartd screams about them possibly needing a firmware 
update (they don't according to Seagate).   Had no issues aside from a 
failure a month or so again (it's an HD ... it happens).

> AFAIK this isn't one of the
> controllers that was known for weird "quirky issues" pertaining to
> flushing data to disk on shutdown.
>
> I have to ask: is this FreeBSD box running under a HV?

No, native/direct for sure on this one.

>
> If it *is not* running under a HV, could we please get exact motherboard
> model and version (including BIOS version)?  Sometimes (not always) you
> can get this from "kenv | grep smbios."

No problem I built this one personally:

Asus P8B-X BIOS revision 6103


>
> I can also see you're running your own kernel.  We'll get to that in a
> moment.

It's GENERIC with the following added to the end:

# -- Add Support for nicer console
#
options VESA
options SC_PIXEL_MODE

# -- PF Support
#
device pf
device pflog
device pfsync

# -- Core temperature reporting
#
device          coretemp         # For Intel CPUs

device          smbios

>
>>> 4. Does "sysctl hw.usb.no_shutdown_wait=1" help you?
>>
>> Weirdly this allowed it to reboot on the first try (without needing
>> to be reset), but not the second.
>
> I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
> working on the USB stack and fixing major bugs.

Got it but probably not going to go this route as it means no more 
binary upgrades.  While I can reboot it, it is the office NAS here and 
so 'testing out' -STABLE I think probably isn't going to happen.

>
>> The "Starting background file
>> system checks in 60 seconds" message appeared ... that only happens
>> when something is dirty, right?
>
> No it does not.  That message is always printed when you use background
> fsck, which is the default.

Got it.

>
> I do not advocate using background fsck, because it has been known (and
> may still do this -- I do not care to find out, I do not have time for
> unreliable filesystem nonsense) to not always fix all filesystem
> problems.  Meaning: people using background fsck have been known to boot
> into single-user and issue "fsck" manually and find issues.
>
> Place background_fsck="no" in /etc/rc.conf.  If the machine does not
> have a clean filesystem on boot-up, you'll know because the system will
> immediately begin fsck (in the foreground actively).  You'll recognise
> that output if it happens, trust me.

Preaching to the choir, we set this on all servers this one somehow did 
not have it set (I think due to ZFS making it unique and not copying our 
rc.conf template over properly).

>
>> So the second try with just this I could ctrl alt del it and it
>> responded .. kind of:
>> http://i.imgur.com/POAIaNg.jpg
>>
>> Still had to reset it though.
>
> This looks like a chicken-and-egg problem -- you're probably fighting
> with background fsck, as the message there indicate "some processes
> would not die".  I'm just taking a guess though.

Yeah.  Even with no background fsck though I still see the issue 
(rebooted 4-5 times testing things further).

I even rolled back to the -p3 kernel because this server was not doing 
this a month or two ago, and definitely not when it was put into 
production.  I only just saw/noticed it doing this when rebooting for p4.

>
> I am now going to ask you for more information:
>
> 1. "gpart show -p xxx" where xxx is each disk you have in the system

 >>>> gpart show -p
=>        34  5860533101    ada0  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada0p1  freebsd-boot  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada0p2  freebsd-zfs  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada1  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada1p1  freebsd-boot  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada1p2  freebsd-zfs  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada2  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada2p1  freebsd-boot  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada2p2  freebsd-zfs  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada3  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada3p1  freebsd-boot  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada3p2  freebsd-zfs  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada4  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada4p1  freebsd-boot  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada4p2  freebsd-zfs  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada5  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada5p1  freebsd-boot  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada5p2  freebsd-zfs  (2.7T)
   5860532224         911          - free -  (455k)

Here also with labels so you get an idea of what the scheme here is:

 >>>> gpart show -p -l
=>        34  5860533101    ada0  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada0p1  boot0  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada0p2  root0  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada1  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada1p1  boot1  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada1p2  root1  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada2  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada2p1  boot2  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada2p2  root2  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada3  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada3p1  boot3  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada3p2  root3  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada4  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada4p1  boot4  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada4p2  root4  (2.7T)
   5860532224         911          - free -  (455k)

=>        34  5860533101    ada5  GPT  (2.7T)
           34           6          - free -  (3.0k)
           40         256  ada5p1  boot5  (128k)
          296        1752          - free -  (876k)
         2048  5860530176  ada5p2  root5  (2.7T)
   5860532224         911          - free -  (455k)

> 2. gmirror list

 >>>> gmirror list
Geom name: boot
State: COMPLETE
Components: 6
Balance: load
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 3404031251
Providers:
1. Name: mirror/boot
    Mediasize: 130560 (127k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r0w0e0
Consumers:
1. Name: ada0p1
    Mediasize: 131072 (128k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: NONE
    GenID: 0
    SyncID: 1
    ID: 3045005471
2. Name: ada1p1
    Mediasize: 131072 (128k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r1w1e1
    State: ACTIVE
    Priority: 1
    Flags: NONE
    GenID: 0
    SyncID: 1
    ID: 1958473873
3. Name: ada2p1
    Mediasize: 131072 (128k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r1w1e1
    State: ACTIVE
    Priority: 2
    Flags: NONE
    GenID: 0
    SyncID: 1
    ID: 1208229558
4. Name: ada3p1
    Mediasize: 131072 (128k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r1w1e1
    State: ACTIVE
    Priority: 3
    Flags: NONE
    GenID: 0
    SyncID: 1
    ID: 3928010527
5. Name: ada4p1
    Mediasize: 131072 (128k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r1w1e1
    State: ACTIVE
    Priority: 4
    Flags: NONE
    GenID: 0
    SyncID: 1
    ID: 442340132
6. Name: ada5p1
    Mediasize: 131072 (128k)
    Sectorsize: 512
    Stripesize: 4096
    Stripeoffset: 0
    Mode: r1w1e1
    State: ACTIVE
    Priority: 0
    Flags: NONE
    GenID: 0
    SyncID: 1
    ID: 1281187492

> 3. Any/all details of your gmirror setup or other things you can
>     think of when you set it up

The only thing is that we use GMIRROR on the partition level because we 
use GPT (which is clear from the gpart output I think).  I gmirror the 
boot partition only in this case as I use ZFS backed swap and ZFS root 
for this server.

> 4. Contents of /etc/fstab

 >>>> cat /etc/fstab
# Device                Mountpoint      FStype  Options Dump    Pass#
# NOTE: ZFS root is not managed here
/dev/zvol/zroot/swap    none            swap    sw              0       0

> 5. Contents of /boot/loader.conf

 >>>> cat /boot/loader.conf
geom_mirror_load="YES"
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"
aio_load="YES"
if_lagg_load="YES"


> 6. Contents of /etc/rc.conf

# ---- Don't run FS check and let apps start
#
fsck_y_enable="YES"
background_fsck="NO"

# ---- Power management enables SpeedStep and TurboBoost
#
powerd_enable="YES"
powerd_flags="-a hiadaptive"

# ---- Networking
#
hostname="hostname"
defaultrouter="xxx.xxx.xxx.3"
# -- LACP
ifconfig_em0="up"
ifconfig_em1="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 xxx.xxx.xxx.212/24"

# ---- Services
#
sshd_enable="YES"
smartd_enable="YES"
samba_enable="YES"
zabbix_agentd_enable="YES"
zfs_enable="YES"
apcupsd_enable="YES"
slapd_enable="YES"
slapd_flags='-h "ldapi://%2fvar%2frun%2fopenldap%2fldapi/ 
ldap://xxx.xxx.xxx.212/ ldap://127.0.0.1/"'
slapd_sockets="/var/run/openldap/ldapi"

# ---- Time Stuff
#
ntpd_enable="YES"
ntpd_sync_on_start="YES"

# ---- Mail
#
postfix_enable="YES"
sendmail_enable="NO"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"

> 7. Contents of /etc/sysctl.conf

kern.maxfiles=25600
kern.maxfilesperproc=16384
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536

> 8. Contents of /sys/amd64/conf/ATEAMSYSTEMS

See above

>
>>> 5. Does "sysctl hw.acpi.handle_reboot=1" help you?
>>
>> No change, still responded to a ctrl alt del like above, but like
>> that still needs to be reset and comes back dirty.
>>
>>>
>>> 6. Does "sysctl hw.acpi.disable_on_reboot=1" help you?
>>
>> No change.  Same as above, ctrl alt del responds but needs a hard
>> reset still.
>
> Okay, thank you.
>
>>> 7. If none of the above helps, can you please boot verbose mode and then
>>> when the system "locks up" on "shutdown -r now" take a picture of the
>>> VGA console?
>>
>> Lots of debug on boot obviously but not much different on shutdown/hang:
>> http://i.imgur.com/SgzSsoP.jpg
>
> It looks to me like the ACPI layer is still actively working at the time
> "all buffers are synced", meaning the actual reboot phase itself never
> happens.  This to me starts to smell of an ACPI problem, but I do not
> have the skill set to debug this, and I'm also grasping at straws.
> There are many things that happen during that phase of operation,
> particularly the "USB shutdown" phase.

Yeah.  Originally I had even my UPS (APC) disconnected, the only USB 
device (via a port -- I realize there might be MB virtual ports) was a 
Dell KB.

>
> But it all depends on your kernel config, which I've now asked for.

Yeah

-- 
Adam Strohl
http://www.ateamsystems.com/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51C1BCF6.8090606>