Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Apr 2014 17:15:26 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Karl Pielorz <kpielorz_lst@tdx.co.uk>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: Stuck CLOSED sockets / sshd / zombies...
Message-ID:  <20140411141526.GT21331@kib.kiev.ua>
In-Reply-To: <652B8CA4866C0B9E4650430B@Mail-PC.tdx.co.uk>
References:  <20140408212319.GC21331@kib.kiev.ua> <D0B81EA30BF8126B37F98D18@study64.tdx.co.uk> <20140409084951.GE21331@kib.kiev.ua> <2A722BB3B12E0D80CA9FF075@Mail-PC.tdx.co.uk> <20140409111917.GH21331@kib.kiev.ua> <851413886E3982D2CCFEA9D9@Mail-PC.tdx.co.uk> <20140410184855.GP21331@kib.kiev.ua> <211BD03C086DDB1A07FDF036@Mail-PC.tdx.co.uk> <20140411131649.GR21331@kib.kiev.ua> <652B8CA4866C0B9E4650430B@Mail-PC.tdx.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

--dZoxY5VDSg+7Vbn9
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Apr 11, 2014 at 02:50:22PM +0100, Karl Pielorz wrote:
>=20
>=20
> --On 11 April 2014 16:16 +0300 Konstantin Belousov <kostikbel@gmail.com>=
=20
> wrote:
>=20
> > On Fri, Apr 11, 2014 at 01:39:54PM +0100, Karl Pielorz wrote:
> >>
> >> Ok, rebuilt a debug world (with your rtld-elf patch), installed it -
> >> reproduced the issue, and ran up gdb on a 'urdlck' stuck sshd, and got
> >> the  trace below.
> > The trace looks reasonable.
>=20
> Great :)
>=20
> > I vaguelly remember that you already answered this, but I want to start
> > investigating from the different angle.  Please show me the output
> > of 'ldd /usr/sbin/sshd' on your machine.  This happens on stable/10,
> > right ?
>=20
> "
> ldd /usr/sbin/sshd
> /usr/sbin/sshd:
>         libssh.so.5 =3D> /usr/lib/private/libssh.so.5 (0x800860000)
>         libutil.so.9 =3D> /lib/libutil.so.9 (0x800abb000)
>         libwrap.so.6 =3D> /usr/lib/libwrap.so.6 (0x800ccd000)
>         libpam.so.5 =3D> /usr/lib/libpam.so.5 (0x800ed6000)
>         libbsm.so.3 =3D> /usr/lib/libbsm.so.3 (0x8010e2000)
>         libgssapi_krb5.so.10 =3D> /usr/lib/libgssapi_krb5.so.10 (0x8012fc=
000)
>         libgssapi.so.10 =3D> /usr/lib/libgssapi.so.10 (0x80151a000)
>         libkrb5.so.11 =3D> /usr/lib/libkrb5.so.11 (0x801723000)
>         libhx509.so.11 =3D> /usr/lib/libhx509.so.11 (0x801999000)
>         libasn1.so.11 =3D> /usr/lib/libasn1.so.11 (0x801be1000)
>         libcom_err.so.5 =3D> /usr/lib/libcom_err.so.5 (0x801e7a000)
>         libroken.so.11 =3D> /usr/lib/libroken.so.11 (0x80207c000)
>         libwind.so.11 =3D> /usr/lib/libwind.so.11 (0x80228d000)
>         libheimbase.so.11 =3D> /usr/lib/libheimbase.so.11 (0x8024b5000)
>         libheimipcc.so.11 =3D> /usr/lib/private/libheimipcc.so.11=20
> (0x8026b9000)
>         libcrypt.so.5 =3D> /lib/libcrypt.so.5 (0x8028bb000)
>         libcrypto.so.7 =3D> /lib/libcrypto.so.7 (0x802adb000)
>         libz.so.6 =3D> /lib/libz.so.6 (0x802ec6000)
>         libc.so.7 =3D> /lib/libc.so.7 (0x8030db000)
>         libldns.so.5 =3D> /usr/lib/private/libldns.so.5 (0x803474000)
>         libmd.so.6 =3D> /lib/libmd.so.6 (0x8036c8000)
>         libthr.so.3 =3D> /lib/libthr.so.3 (0x8038d8000)
> "
So my suspicious idea seems to be true. From the ldd output, libc
appears before libthr in the global order, so libc sigaction() symbol
is resolved before libthr interposer. The result is that libthr wrapper
thr_sighandler() for the signal handlers is not installed as the
recepient of the kernel signal, which prevents libthr locks for rtld
=66rom working properly.

You could see this in the backtrace below, which is indicated by lack of
the thr_signhandler in backtrace while obviously signal handler is
activated.

>=20
> The box is stable/10 - quite an old stable 10 now, but afaik other people=
=20
> have hit a similar issue on newer stable 10's - I've not updated this box=
,=20
> as I've seen nothing to say it's "fixed" in newer versions [and it's=20
> obviously been under investigation for weeks now on this machine as well,=
=20
> long before I posted to -hackers]. I can update to a newer version (e.g.=
=20
> today) if you want.
Better not, to keep the environment stable and the problem to not disappear
magically.  But it seems that it is consistent enough, on the HEAD box I
see the same order for needed libraries.

>=20
> > I do not see any linking with libpthread in the sshd Makefile.  Could it
> > be that libthr is loaded as dependency of some pam module ?
>=20
> Possibly - I don't know. This is stock FreeBSD #10 Stable - i.e. I've not=
=20
> configured anything differently on SSH than what you get 'out the box'.=
=20
> I've never done anything with PAM - so I don't know where I'd go checking=
=20
> that kind of thing (but can if you point me in the right direction).

To confirm or deny my theory, please apply the patch below, in addition to
the previous patch, and rebuild sshd only,
# cd src/secure/usr.sbin/sshd && make clean all install
The patch tilts the order of initialization, for my build I got
sandy% ldd /usr/sbin/sshd                                                  =
   ~
/usr/sbin/sshd:
        libssh.so.5 =3D> /usr/lib/private/libssh.so.5 (0x800863000)
        libutil.so.9 =3D> /lib/libutil.so.9 (0x800af0000)
=2E..
        libz.so.6 =3D> /lib/libz.so.6 (0x802f0d000)
        libthr.so.3 =3D> /lib/libthr.so.3 (0x803123000)
        libc.so.7 =3D> /lib/libc.so.7 (0x803348000)
        libldns.so.5 =3D> /usr/lib/private/libldns.so.5 (0x8036d1000)
        libmd.so.6 =3D> /lib/libmd.so.6 (0x803926000)
which could be enough to prevent the bug.

Please retest and report.

diff --git a/secure/usr.sbin/sshd/Makefile b/secure/usr.sbin/sshd/Makefile
index 4f730a9..5e399fa 100644
--- a/secure/usr.sbin/sshd/Makefile
+++ b/secure/usr.sbin/sshd/Makefile
@@ -54,8 +54,8 @@ LDADD+=3D	 -lgssapi_krb5 -lgssapi -lkrb5 -lhx509 -lasn1 \
 CFLAGS+=3D -DNONE_CIPHER_ENABLED
 .endif
=20
-DPADD+=3D ${LIBCRYPT} ${LIBCRYPTO} ${LIBZ}
-LDADD+=3D -lcrypt -lcrypto -lz
+DPADD+=3D ${LIBCRYPT} ${LIBCRYPTO} ${LIBZ} ${LIBPTHREAD}
+LDADD+=3D -lcrypt -lcrypto -lz -lpthread
=20
 .if defined(LOCALBASE)
 CFLAGS+=3D -DXAUTH_PATH=3D\"${LOCALBASE}/bin/xauth\"

--dZoxY5VDSg+7Vbn9
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJTR/j+AAoJEJDCuSvBvK1BG9AP/jew8JXrQ0emhZ7I1EbhVETB
6zciag4xAeNDxKaBvjVstHsMH69tzOSPM3M7aGKcsex6HNbtDggGH8lzfRsI89v3
E+URkk8lMTGbXDaRvJDypuYzViLPL7UzoqPqeqGuB+3au8EPQYjBk1QDo4h0dbnx
TGBr6svasbA9dFyBgIkrjjHxYZ08NJPynEgfnVhR2+0tB5Ie8PeJE1S4aRjExQ0j
bkmpleLUEy/wRLQy4UZvIxqR5YH604t5KEMerVN3ZOhy8xag1QRkjFaH2H9y8ueO
FO9LFpdlU2K0ub6KEGXjrwXYw973LE+EE5BOxTgjjn2pNIeCjvJ3+o/mR1mrWPox
UDcpJNwWQD/KEiX4BQywHGRpcgMfgDy+9ySEfMz3l0CvgyhYq6XNLEOaQE6kCUJB
BV0kQ0VirejDysY46cN8zMW9FXp9566r+4TeAphYobYPBE5pT98thf+3cqKWHDE3
k9FEtTuzntMg1hfoL0s0MIMMDpsSpn7Oruj3cX9MtvvRRtsbjIRclTPLDRalAd56
RaQq/EZ835HSunR13Fva7Fj6kEY4EDnINUdSEAcQfcM4GD+WxEk8QYLZxan/SPR0
dYVDRgAmRm0vVklupZxDdWe1i4QWajcQ40pm8ahWBgMo929KECJPDpLebK0XN8r+
7jclPL5SxU/8RJ59f0TW
=UEa1
-----END PGP SIGNATURE-----

--dZoxY5VDSg+7Vbn9--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140411141526.GT21331>