Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Mar 2016 09:06:37 -0800
From:      Maxim Sobolev <sobomax@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Kirk McKusick <mckusick@mckusick.com>, peter@holm.cc, fs@freebsd.org
Subject:   Re: Process stuck in "vnread"
Message-ID:  <CAH7qZfvsjavZF1b%2BBP6%2B2itG8buqiObrqPVOBk7q7-9Z9%2BNS2Q@mail.gmail.com>
In-Reply-To: <20160302115707.GF67250@kib.kiev.ua>
References:  <CAH7qZfs3EwT8jnKyodHxF_5nK18MeLSaB_F-qqOfwV0MJMD7Vg@mail.gmail.com> <20160302095339.GB67250@kib.kiev.ua> <CAH7qZfs4jCiP=ARaZjGGW1XVa63a-oOkaWtCO1L1-Hk%2Bema7OQ@mail.gmail.com> <20160302115707.GF67250@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin, this is nullfs mounted over UFS and nullfs is pointing over to
the part of the ZFS tree. I am not sure if it's what you are talking about
or not.


storage/builder on /builder (zfs, local, nfsv4acls)

md0     vnode    3200M  /builder/tmp/sspicd_tmp.ufs

/dev/md0 on /builder/mnt (ufs, asynchronous, local, noatime)
/builder/usr/ports-bitbucket on /builder/mnt/usr/ports (nullfs, local)

So, stuck process refers to file effectively being copied over from
/builder/mnt/usr/local/share/automake-1.15/compile to
/builder/usr/ports-bitbucket/SOMETHING/./compile by the process chrooted
into /builder/mnt, and it could be either in the read path or in the write
path. However looking at the full kernel side of stack trace of that cp(1),
I'd say it's probably the latter, as this would have to traverse through
top level vfs/ufs first, to nullfs layer and then via zfs, none of the last
two is compiled in so that there is no proper traceback. The nullfs mount
is used to allow it accessing ZFS tree on the upper level, i.e.
/builder/usr.

Unfortunately I cannot find a way to figure out specific system call that
cp got stuck in. Attempting to attach gdb causes gdb to hang in turn. So
unless somebody got any other ideas on how to get some useful post-mortem
debug out of this situation I'll have to restart the box soon to recover it.

I will put your patch in and see if it helps. I'd also compile nullfs
statically, so at least if it hits again we have some post-mortem evidence
to work with.

----
(kgdb) thread 362
[Switching to thread 362 (Thread 100515)]#0  0xffffffff8095244e in
sched_switch ()
(kgdb) bt
#0  0xffffffff8095244e in sched_switch ()
#1  0xffffffff809313b1 in mi_switch ()
#2  0xffffffff8097089a in sleepq_wait ()
#3  0xffffffff80930dd7 in _sleep ()
#4  0xffffffff809b230e in bwait ()
#5  0xffffffff80b511f3 in vnode_pager_generic_getpages ()
#6  0xffffffff80dd1607 in VOP_GETPAGES_APV ()
#7  0xffffffff80b4f59a in vnode_pager_getpages ()
#8  0xffffffff80b30031 in vm_fault_hold ()
#9  0xffffffff80b2f797 in vm_fault ()
#10 0xffffffff80cb5a75 in trap_pfault ()
#11 0xffffffff80cb51dd in trap ()
#12 0xffffffff80c9b122 in calltrap ()
#13 0xffffffff80cb36f1 in copyin ()
#14 0xffffffff80977ddf in uiomove_faultflag ()
#15 0xffffffff819f699c in ?? ()
#16 0xfffffe0468a861a0 in ?? ()
#17 0xfffff80000000000 in ?? ()
#18 0xfffffe0468a861a0 in ?? ()
#19 0xfffff80176b39420 in ?? ()
#20 0x0000000000000001 in ?? ()
#21 0xfffff801ee76f500 in ?? ()
#22 0xfffffe0468a86960 in ?? ()
#23 0x00000001e3a72d80 in ?? ()
#24 0xfffff80176b39420 in ?? ()
#25 0xfffff803e3a72d80 in ?? ()
#26 0xfffffe0468a86960 in ?? ()
#27 0xfffff801881130e8 in ?? ()
#28 0xfffff801ee76f500 in ?? ()
#29 0x0000000000001ca5 in ?? ()
#30 0xfffffe0468a86200 in ?? ()
#31 0xffffffff819f68b2 in ?? ()
#32 0x0000000000001ca5 in ?? ()
#33 0x0000000000001ca5 in ?? ()
#34 0xfffff80188113000 in ?? ()
#35 0xfffffe0468a86960 in ?? ()
#36 0xfffffe0468a86440 in ?? ()
#37 0xffffffff81a90a77 in ?? ()
#38 0xfffff80100000002 in ?? ()
#39 0x0000000181a6c5c2 in ?? ()
#40 0x0000000000000000 in ?? ()


On Wed, Mar 2, 2016 at 3:57 AM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Wed, Mar 02, 2016 at 03:02:02AM -0800, Maxim Sobolev wrote:
> > About the backtrace, indeed, looks like you are right and some portion of
> > it is not decoded properly, as it's loaded as a kernel module. The setup
> is
> > somewhat even more complicated, the /usr/ports is mounted via NULLFS, so
> in
> > this command:
> >
> > cp /usr/local/share/automake-1.15/compile ./compile
> >
> > The target (i.e. ./compile) here is a path on ZFS that is exported via
> > NULLFS, while the source is a file on UFS2->md->ZFS. This is probably the
> > reason stack trace is incomplete, both zfs.ko and nullfs.ko are loaded as
> > modules and the next few frames point towards those. Unfortunately I
> cannot
> > beat kgdb to read symbols from those .ko's and decode them.
>
> Is nullfs mount put over ZFS only ?  The backtrace you shown cannot
> happen for ZFS, since ZFS has its own pager vop.  In fact, I would
> agree that the backtrace is reasonable for nullfs over UFS upper vnode.
> The following patch should fix the 'paging while faulting on uiomove'
> issue for nullfs over UFS.
>
> Peter, could you, please, test the patch ?  It is purely nullfs change,
> and the most interesting situation is the ups' deadlock, but the whole
> set of nullfs tests would be good to check.
>
> diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
> index 64e1e29..49bae28 100644
> --- a/sys/fs/nullfs/null_vfsops.c
> +++ b/sys/fs/nullfs/null_vfsops.c
> @@ -199,7 +199,7 @@ nullfs_mount(struct mount *mp)
>         }
>         mp->mnt_kern_flag |= MNTK_LOOKUP_EXCL_DOTDOT;
>         mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
> -           MNTK_USES_BCACHE;
> +           (MNTK_USES_BCACHE | MNTK_NO_IOPF | MNTK_UNMAPPED_BUFS);
>         MNT_IUNLOCK(mp);
>         mp->mnt_data = xmp;
>         vfs_getnewfsid(mp);
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfvsjavZF1b%2BBP6%2B2itG8buqiObrqPVOBk7q7-9Z9%2BNS2Q>