Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Mar 2013 11:49:54 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Jeremy Chadwick <jdc@koitsu.org>, Michael Landin Hostbaek <mich@FreeBSD.org>, freebsd-stable@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>, Andriy Gapon <avg@FreeBSD.org>
Subject:   Re: Core Dump / panic sleeping thread
Message-ID:  <20130320094954.GV3794@kib.kiev.ua>
In-Reply-To: <153890828.4081736.1363736263509.JavaMail.root@erie.cs.uoguelph.ca>
References:  <5148A454.1080303@FreeBSD.org> <153890828.4081736.1363736263509.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

--UDXac3CCxvoffKng
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Mar 19, 2013 at 07:37:43PM -0400, Rick Macklem wrote:
> Andriy Gapon wrote:
> > on 19/03/2013 19:35 Jeremy Chadwick said the following:
> > > On Tue, Mar 19, 2013 at 06:18:06PM +0100, Michael Landin Hostbaek
> > > wrote:
> > [snip]
> > >> Unread portion of the kernel message buffer:
> > >> Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock
> > >> KDB: stack backtrace of thread 100256:
> > >> #0 0xffffffff808f2d46 at mi_switch+0x186
> > >> #1 0xffffffff8092bb52 at sleepq_wait+0x42
> > >> #2 0xffffffff808f34d6 at _sleep+0x376
> > >> #3 0xffffffff80b4f3ae at vm_object_page_remove+0x2ce
> > >> #4 0xffffffff80b5ac7d at vnode_pager_setsize+0x17d
> > >> #5 0xffffffff8082102c at nfscl_loadattrcache+0x2cc
> > >> #6 0xffffffff80818d37 at nfs_getattr+0x287
> > >> #7 0xffffffff8098f1c0 at vn_stat+0xb0
> > >> #8 0xffffffff809869d9 at kern_statat_vnhook+0xf9
> > >> #9 0xffffffff80986b55 at kern_statat+0x15
> > >> #10 0xffffffff80986c1a at sys_lstat+0x2a
> > >> #11 0xffffffff80bd7ae6 at amd64_syscall+0x546
> > >> #12 0xffffffff80bc3447 at Xfast_syscall+0xf7
> > >> panic: sleeping thread
> > >> cpuid =3D 0
> > >> KDB: stack backtrace:
> > >> #0 0xffffffff809208a6 at kdb_backtrace+0x66
> > >> #1 0xffffffff808ea8be at panic+0x1ce
> > >> #2 0xffffffff8092ed22 at propagate_priority+0x1d2
> > >> #3 0xffffffff8092fa4e at turnstile_wait+0x1be
> > >> #4 0xffffffff808d8d48 at _mtx_lock_sleep+0xd8
> > >> #5 0xffffffff80820fa4 at nfscl_loadattrcache+0x244
> > >> #6 0xffffffff8081758c at ncl_readrpc+0xac
> > >> #7 0xffffffff80824c45 at ncl_getpages+0x485
> > >> #8 0xffffffff80b5aa0c at vnode_pager_getpages+0x9c
> > >> #9 0xffffffff80b3fc93 at vm_fault_hold+0x673
> > >> #10 0xffffffff80b41cc3 at vm_fault+0x73
> > >> #11 0xffffffff80bd84b4 at trap_pfault+0x124
> > >> #12 0xffffffff80bd8c6c at trap+0x49c
> > >> #13 0xffffffff80bc315f at calltrap+0x8
> > [snip]
> >=20
> > I think that the regular mutex which is acquired via NFSLOCKNODE() in
> > nfscl_loadattrcache() can not be held across vnode_pager_setsize.
> > I am not sure though when vap->va_size !=3D np->n_size case is
> > triggered.
> >=20
> Yep, I'd agree to that. The same bug is in the old NFS client and
> the new NFS client cribbed the code from there.
>=20
> I have attached a simple patch that unlocks the mutex for the
> vnode_pager_setsize() call. Maybe you could test it?
>=20
> Thanks for reporting this, rick
> ps: Hopefully "patch" can apply this patch (there have been
>     recent changes to this file, so the line#s could be off).
>     It should be easy to do manually if not. The change is
>     in nfscl_loadattrcache() in sys/fs/nfsclient/nfs_clport.c.
>=20
>=20
> > > You're going to need to provide the following details:
> > >
> > > 1. Contents of /etc/rc.conf
> > > 2. Contents of /etc/sysctl.conf (if modified)
> > > 3. Contents of /etc/fstab
> > > 4. ifconfig -a
> > > 5. OS used by the NFS server, and all configuration details
> > > pertaining
> > > to that system
> > >
> > > You may also be asked to upgrade to 9.1-STABLE, as there may be
> > > fixes
> > > for whatever this is in base/stable/9 that are not in -RELEASE, but
> > > this
> > > is speculative on my part.
> > >
> > I do not see a need for any of these.
> >=20
> > --
> > Andriy Gapon
> > _______________________________________________
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to
> > "freebsd-stable-unsubscribe@freebsd.org"

> --- fs/nfsclient/nfs_clport.c.savit	2013-03-19 18:37:33.000000000 -0400
> +++ fs/nfsclient/nfs_clport.c	2013-03-19 18:44:21.000000000 -0400
> @@ -444,7 +444,9 @@ nfscl_loadattrcache(struct vnode **vpp,=20
>  				np->n_size =3D vap->va_size;
>  				np->n_flag |=3D NSIZECHANGED;
>  			}
> +			NFSUNLOCKNODE(np);
>  			vnode_pager_setsize(vp, np->n_size);
> +			NFSLOCKNODE(np);
>  		} else {
>  			np->n_size =3D vap->va_size;
>  		}

I do not like it. As I said in the previous response to Andrey,
I think that moving the vnode_pager_setsize() after the unlock is
better, since it reduces races with other thread seeing half-done
attribute update or making attribute change simultaneously.

--UDXac3CCxvoffKng
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRSYZBAAoJEJDCuSvBvK1BAdIQAItosWsItKlK1fjCuub9R/Q0
0wPSKBIjKmgKHiHtIVEZJz9l9vCsALfRQCqYiFE2U3N5zaQUIXEQl9ZXajzWSOVR
uNAXJ+kx7g0ChiwVE9vK8+7LGoW5c6eIJMymefLPZ0B1G3kpGJzqnc90HzEXMB17
Xsdfv+RXzSmstNxbukXk7DwRtRmUtoyaV0t07P5NOUFVnLclgO9ycI2vgmP5tYFe
5r6V78XH5tZLahzs9tMwqEGwTPQWOiveeXLR0mM9QP77hP/16i8dSvmWhkuxZunY
abSELYVvDi39yHn8pK+YN1KtVJV5OJoHP4HzMM4wH+NeAyAfZh/bjPJrm1prdqfW
BKwryxjj42TxxZrGS2l+gfBnr4EIhJIfPs36dw2p6H7O9oDbhQzmwVt+hJTB0ISZ
PjQx9Lxjm7dDYIQdzIMqfMP6jdYwljSjIgsABMONF18p+QGh86o/FpAkuCxnmcqE
KnPhMVhEgB/LDXJZBUNK3PWvnytJYZSmnErYKmXABA51R6OUqLNdN0KjdE4OgUXx
BXlvNOfkZHog3Efu0jjYhTCGEK9X8oSJFcvotl/XdR5CyMKOzC3qrhbAIPuIOaLK
I4wYy8HFeqAD6IR9ZLIwS4HBMm4IS+k2ZztLvpTQn5g08MHM19Q0/Z2J6t8PTUO2
AWclE8ePLKak9nKmdCwm
=XLwM
-----END PGP SIGNATURE-----

--UDXac3CCxvoffKng--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130320094954.GV3794>