Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Aug 2014 14:24:07 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Daniel Andersen <dea@caida.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Process enters unkillable state and somewhat wedges zfs
Message-ID:  <5842681.mjgMD2kESs@ralph.baldwin.cx>
In-Reply-To: <53FE4C9F.7030406@caida.org>
References:  <53F25402.1020907@caida.org> <201408271639.09352.jhb@freebsd.org> <53FE4C9F.7030406@caida.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, August 27, 2014 02:24:47 PM Daniel Andersen wrote:
> On 08/27/2014 01:39 PM, John Baldwin wrote:
> > These are all blocked in "zfs" then.  (For future reference, the 'mwchan'
> > field that you see as 'STATE' in top or via 'ps O mwchan' is more detailed
> > than the 'D' state.)
> > 
> > To diagnose this further, you would need to see which thread holds the
> > ZFS vnode lock these threads need.  I have some gdb scripts you can use to
> > do that at www.freebsd.org/~jhb/gdb/.  You would want to download 'gdb6*'
> > files from there and then do this as root:
> > 
> > # cd /path/to/gdb/files
> > # kgdb
> > (kgdb) source gdb6
> > (kgdb) sleepchain 42335
> > 
> > Where '42335' is the pid of some process stuck in "zfs".
> 
> I will keep this in mind the next time the machine wedges.  Another data
> point: the second procstat output I sent was the most recent.  All the
> processes listed there were after the fact.  The process that started the
> entire problem ( this time ) was sudo, and it only has this one entry in
> procstat:
> 
> 38003 102797 sudo             -                <running>
> 
> Of note, this does not appear to be blocked on zfs in anyway.  'ps' showed
> it in 'R' state instead of 'D' ( I will be sure to use mwchan in the
> future. ) It appeared to be pegging an entire CPU core at 100% usage, as
> well, and was only single threaded.

Well, if it is spinning in some sort of loop in the kernel while holding a
ZFS vnode lock that could be blocking all the other threads.  In that case,
you don't need to do what I asked for above.  Instead, we need to find out
what that thread is doing.  There are two ways of doing this.  One is to
force a panic via 'sysctl debug.kdb.panic=1' and then use kgdb on the
crashdump to determine what the running thread is doing.  Another option
is to break into the DDB debugger on the console (note that you will need
to build a custom kernel with DDB if you are on stable) and request a
stack trace of the running process via 'tr <pid>'.  Ideally you can do this
over a serial console so you can just cut and paste the output of the trace
into a mail.  Over a video console you can either transcribe it by hand or
take photos.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5842681.mjgMD2kESs>