Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Feb 2013 16:20:58 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Marc Fournier <scrappy@hub.org>, Rick Macklem <rmacklem@uoguelph.ca>, freebsd-stable@freebsd.org
Subject:   Re: 9-STABLE -> NFS -> NetAPP:
Message-ID:  <20130215142058.GP2522@kib.kiev.ua>
In-Reply-To: <201302150844.43188.jhb@freebsd.org>
References:  <1964289267.3041689.1360897556427.JavaMail.root@erie.cs.uoguelph.ca> <201302150844.43188.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--x6xK9fUDGUdTBgu/
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 15, 2013 at 08:44:43AM -0500, John Baldwin wrote:
> On Thursday, February 14, 2013 10:05:56 pm Rick Macklem wrote:
> > Marc Fournier wrote:
> > > On 2013-02-13, at 3:54 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> > >=20
> > > >>
> > > > The pid that is in "T" state for the "ps auxlH".
> > >=20
> > > Different server, last kernel update on Jan 22nd, https process this
> > > time instead of du last time.
> > >=20
> > > I've attached:
> > >=20
> > > ps auxlH
> > > ps auxlH of just the processes that are in TJ state (6 httpd servers)
> > > procstat output for each of the 6 process
> > >=20
> > >=20
> > >=20
> > >=20
> > > They are included as attachments ??? if these don't make it through, =
let
> > > me know, just figured I'd try and keep it compact ...
> > Well, I've looked at this call path a little closer:
> > 16693 104135 httpd            -                mi_switch+0x186=20
> thread_suspend_check+0x19f sleepq_catch_signals+0x1c5
> >   sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763=20
> clnt_reconnect_call+0xfb newnfs_request+0xadb
> >   nfscl_request+0x72 nfsrpc_accessrpc+0x1df nfs34_access_otw+0x56=20
> nfs_access+0x306 vn_open_cred+0x5a8
> >   kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7=20
> >=20
> > I am probably way off, since I am not familiar with this stuff, but it
> > seems to me that thread_suspend_check() should just return 0 for the
> > case where stop_allowed =3D=3D SIG_STOP_NOT_ALLOWED (TDF_SBDRY flag set)
> > instead of sitting in the loop and doing a mi_switch(). I'm not even
> > sure if it should call thread_suspend_check() for this case, but there
> > are cases in thread_suspend_check() that I don't understand.
> >=20
> > Although I don't really understand thread_suspend_check(), I've attached
> > a simple patch that might be a starting point for fixing this?
> >=20
> > I wouldn't recommend trying the patch until kib and/or jhb weigh in
> > on whether it makes any sense.
>=20
> I think this is the right idea, but in HEAD with the sigdeferstop() chang=
es it=20
> should just check for TDF_SBDRY instead of adding a new parameter.  I thi=
nk
> checking for TDF_SBDRY will work even in 9 (and will make the patch small=
er). =20
> Also, I think this is only needed for stop signals.  Other suspend reques=
ts=20
> will eventually resume the thread, it is only stop signals that can cause=
 the=20
> thread to get stuck indefinitely (since it depends on the user sending=20
> SIGCONT).
>=20
> Marc, are you using SIGSTOP?
>=20
> Index: kern_thread.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- kern_thread.c	(revision 246122)
> +++ kern_thread.c	(working copy)
> @@ -795,6 +795,17 @@ thread_suspend_check(int return_instead)
>  			return (ERESTART);
> =20
>  		/*
> +		 * Ignore suspend requests for stop signals if they
> +		 * are deferred.
> +		 */
> +		if (P_SHOULDSTOP(p) =3D=3D P_STOPPED_SIG &&
> +		    td->td_flags & TDF_SBDRY) {
> +			KASSERT(return_instead,
> +			    ("TDF_SBDRY set for unsafe thread_suspend_check"));
> +			return (0);
> +		}
> +
> +		/*
>  		 * If the process is waiting for us to exit,
>  		 * this thread should just suicide.
>  		 * Assumes that P_SINGLE_EXIT implies P_STOPPED_SINGLE.

This looks correct.

--x6xK9fUDGUdTBgu/
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRHkRKAAoJEJDCuSvBvK1BhqcP/1ZMeeY09CIL6iaas1h7ZNbS
C3Mli28bNxVuvxTs9GhDUNm0XxKjAmQtAFzV8m+DFC1TfYBqr++0OT1lXlFcclrV
9bCN7qFi/iWvW08FT2xkjb531oRYdkWe7T5fSDb36RhYEvmxRdy3aefojls8rJTf
CBaggYsKTcV0XXabrJnaeZcvm1JLoL2cbvy9CR1MExrn40JqWEOkxkK/PuhCGU46
rAxOrNdbequrvdMQrwoNOrX8H/e+O8pm6Ze2O2QkxMAKXQhFc02vNi3nlNRC8P6f
2vXp1GN6W+sWHh+QkUEI++lwiR8w2B8101DVQInyJhexGMCYtoBMBw9TsH4kwouG
nPwHQ6wUc6bhILT9qZUd8Ebx7zTIpTjPT63wsImo6uJD+g19UirwMHbb2OKo1Tao
NeItzu831RFmGGuwIGzQFkOjGXlYLbFKV5I4F0pVAJcSG5msPpZOz5GLdEToMLMO
w58qTu2vB0UOlAek6XivlZe7+wFsKYgDP/Q/+9G53ZEXEjHLCeqQG/xeA19Gz+T2
M4xoNGSCFWJ2BXJ6TjOYC9z0paxIFiAhApnxNgubcxo97gx+51RDRVzxr7vE1Bca
W6OazVjcP3pLvQbtU+QWwe09xP/CchahfdSdClchZRItG71kMm+FQzR9fl3HFQKX
6qPRrdT3hpYSnmYSSLHP
=zQlJ
-----END PGP SIGNATURE-----

--x6xK9fUDGUdTBgu/--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130215142058.GP2522>