Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Mar 2016 03:04:40 -0800
From:      Maxim Sobolev <sobomax@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        stable@freebsd.org, freebsd-fs@freebsd.org,  Kirk McKusick <mckusick@mckusick.com>
Subject:   Re: Process stuck in "vnread"
Message-ID:  <CAH7qZfsCqVPxo5LAOag%2BFwjfb6GOzRVjLooB9kGQ-wSAXGUQ0w@mail.gmail.com>
In-Reply-To: <CAH7qZfs4jCiP=ARaZjGGW1XVa63a-oOkaWtCO1L1-Hk%2Bema7OQ@mail.gmail.com>
References:  <CAH7qZfs3EwT8jnKyodHxF_5nK18MeLSaB_F-qqOfwV0MJMD7Vg@mail.gmail.com> <20160302095339.GB67250@kib.kiev.ua> <CAH7qZfs4jCiP=ARaZjGGW1XVa63a-oOkaWtCO1L1-Hk%2Bema7OQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Sorry gmail hit set too early. Backtrace from the md worker:

[Switching to thread 357 (Thread 101131)]#0  0xffffffff8095244e in
sched_switch ()
(kgdb) bt
#0  0xffffffff8095244e in sched_switch ()
#1  0xffffffff809313b1 in mi_switch ()
#2  0xffffffff8097089a in sleepq_wait ()
#3  0xffffffff808d344d in _cv_wait ()
#4  0xffffffff81a42185 in ?? ()
#5  0xfffff803096d3960 in ?? ()
#6  0x0000000000000000 in ?? ()


On Wed, Mar 2, 2016 at 3:02 AM, Maxim Sobolev <sobomax@sippysoft.com> wrote:

> Thanks, Konstantin.
>
> Re: md(4) state:
>
>    0 88688     0   0  -8  0       0      16 tx->tx_s DL    -       0:45.43
> [md0]
>
> Its backtrace:
>
>
> About the backtrace, indeed, looks like you are right and some portion of
> it is not decoded properly, as it's loaded as a kernel module. The setup is
> somewhat even more complicated, the /usr/ports is mounted via NULLFS, so in
> this command:
>
> cp /usr/local/share/automake-1.15/compile ./compile
>
> The target (i.e. ./compile) here is a path on ZFS that is exported via
> NULLFS, while the source is a file on UFS2->md->ZFS. This is probably the
> reason stack trace is incomplete, both zfs.ko and nullfs.ko are loaded as
> modules and the next few frames point towards those. Unfortunately I cannot
> beat kgdb to read symbols from those .ko's and decode them.
>
> #13 0xffffffff80cb36f1 in copyin ()
> #14 0xffffffff80977ddf in uiomove_faultflag ()
> #15 0xffffffff819f699c in ?? ()
> #16 0xfffffe0468a861a0 in ?? ()
> #17 0xfffff80000000000 in ?? ()
> #18 0xfffffe0468a861a0 in ?? ()
> #19 0xfffff80176b39420 in ?? ()
> #20 0x0000000000000001 in ?? ()
>
> $ kldstat | grep 0xffffffff819
>  2    1 0xffffffff819bd000 aef8     nullfs.ko
>  3    1 0xffffffff819c8000 2fd2f0   zfs.ko
>
>
>
>
> On Wed, Mar 2, 2016 at 1:53 AM, Konstantin Belousov <kostikbel@gmail.com>
> wrote:
>
>> On Wed, Mar 02, 2016 at 01:12:31AM -0800, Maxim Sobolev wrote:
>> > Hi, I've encountered cp(1) process stuck in the vnread state on one of
>> my
>> > build machines that got recently upgraded to 10.3.
>> >
>> >    0 79596     1   0  20  0   17092    1396 wait     I     1
>>  0:00.00
>> > /bin/sh /usr/local/bin/autoreconf -f -i
>> >    0 79602 79596   0  52  0   41488    9036 wait     I     1
>>  0:00.07
>> > /usr/local/bin/perl -w /usr/local/bin/autoreconf-2.69 -f -i
>> >    0 79639 79602   0  72  0       0       0 -        Z     1
>>  0:00.27
>> > <defunct>
>> >    0 79762 79602   0  20  0   17092    1396 wait     I     1
>>  0:00.00
>> > /bin/sh /usr/local/bin/automake --add-missing --copy --force-missing
>> >    0 79768 79762   0  52  0   49736   13936 wait     I     1
>>  0:00.11
>> > /usr/local/bin/perl -w /usr/local/bin/automake-1.15 --add-missing --copy
>> > --force-missing
>> >    0 79962 79768   0  20  0   12368    1024 vnread   DL    1
>>  0:00.00
>> > cp /usr/local/share/automake-1.15/compile ./compile
>> >
>> > I am not sure if it's related to that OS version upgrade, but I have not
>> > seen any such issues on the same machine in 2-3 years running
>> essentially
>> > the same build process with version 9.x, 10.0, 10.1 and 10.2.
>> >
>> > $ uname -a
>> > FreeBSD van01.sippysoft.com 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1
>> > 80de3e2(master)-dirty: Tue Feb  2 12:19:57 PST 2016
>> > sobomax@abc.sippysoft.com:
>> /usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC
>> >  amd64
>> >
>> > The kernel stack trace is:
>> >
>> > (kgdb) thread 360
>> > [Switching to thread 360 (Thread 100515)]#0  0xffffffff8095244e in
>> > sched_switch ()
>> > (kgdb) bt
>> > #0  0xffffffff8095244e in sched_switch ()
>> > #1  0xffffffff809313b1 in mi_switch ()
>> > #2  0xffffffff8097089a in sleepq_wait ()
>> > #3  0xffffffff80930dd7 in _sleep ()
>> > #4  0xffffffff809b230e in bwait ()
>> > #5  0xffffffff80b511f3 in vnode_pager_generic_getpages ()
>> > #6  0xffffffff80dd1607 in VOP_GETPAGES_APV ()
>> > #7  0xffffffff80b4f59a in vnode_pager_getpages ()
>> > #8  0xffffffff80b30031 in vm_fault_hold ()
>> > #9  0xffffffff80b2f797 in vm_fault ()
>> > #10 0xffffffff80cb5a75 in trap_pfault ()
>> > #11 0xffffffff80cb51dd in trap ()
>> > #12 0xffffffff80c9b122 in calltrap ()
>> > #13 0xffffffff80cb36f1 in copyin ()
>> > #14 0xffffffff80977ddf in uiomove_faultflag ()
>> The backtrace indicates, with 99% certainity that the issue is in the
>> requested read never finishing.  But the backtrace is obviously not
>> complete, and there might be something more happening.  At least,
>> we do not handle page-ins during uiomove() on user io for quite
>> some time.
>>
>> If the vnode which io hung is UFS over md, you should look at the md
>> worker thread state.
>>
>> >
>> > The FS stack configuration is somewhat unique, so I am not sure if I am
>> > hitting some rare race condition or lock ordering issues specific to
>> that.
>> > It's basically ZFS (ZRAID) on top of pair or SATA SSDs with big file on
>> > that FS attached via md(4) and UFS2 on that md(4). The build itself
>> runs in
>> > chroot with that UFS2 fs as its primary root.
>> >
>> > Just maybe additional bit of info, attempting to list the directory with
>> > that UFS image also got my bash process stuck in "zfs" state, backtrace
>> > from that is:
>> A deadlock in the underlying io layer is consistent with this (secondary)
>> observation.
>>
>> >
>> > (kgdb) thread 353
>> > [Switching to thread 353 (Thread 100508)]#0  0xffffffff8095244e in
>> > sched_switch ()
>> > (kgdb) bt
>> > #0  0xffffffff8095244e in sched_switch ()
>> > #1  0xffffffff809313b1 in mi_switch ()
>> > #2  0xffffffff8097089a in sleepq_wait ()
>> > #3  0xffffffff809069ad in sleeplk ()
>> > #4  0xffffffff809060e0 in __lockmgr_args ()
>> > #5  0xffffffff809b8b7c in vop_stdlock ()
>> > #6  0xffffffff80dd0a3b in VOP_LOCK1_APV ()
>> > #7  0xffffffff809d6d23 in _vn_lock ()
>> > #8  0xffffffff81a8c9cd in ?? ()
>> > #9  0x0000000000000000 in ?? ()
>>
>>
>
>
> --
> Maksym Sobolyev
> Sippy Software, Inc.
> Internet Telephony (VoIP) Experts
> Tel (Canada): +1-778-783-0474
> Tel (Toll-Free): +1-855-747-7779
> Fax: +1-866-857-6942
> Web: http://www.sippysoft.com
> MSN: sales@sippysoft.com
> Skype: SippySoft
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfsCqVPxo5LAOag%2BFwjfb6GOzRVjLooB9kGQ-wSAXGUQ0w>