Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Jun 2011 03:53:19 -0400
From:      Nathaniel W Filardo <nwf@cs.jhu.edu>
To:        freebsd-current@freebsd.org, freebsd-sparc64@freebsd.org
Subject:   Re: TLS bug?
Message-ID:  <20110616075319.GM31996@gradx.cs.jhu.edu>
In-Reply-To: <20110616073138.GL31996@gradx.cs.jhu.edu>
References:  <20110616073138.GL31996@gradx.cs.jhu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

--CaPKgh3XHpq3rEUV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Atcht; it's late.  I forgot to mention that this system is a sparc64 V240
2-way SMP machine.  It's running a kernel from 9.0-CURRENT r222833+262af52:
Tue Jun  7 18:47:35 EDT 2011 and a userland from a little later.

Sorry about that.
--nwf;

On Thu, Jun 16, 2011 at 03:31:38AM -0400, Nathaniel W Filardo wrote:
> I have a few applications (bonnie++ and mysql, specifically, both from
> ports) which trip over the assertion in
> lib/libc/stdlib/malloc.c:/^_malloc_thread_cleanup that
> >   assert(tcache !=3D (void *)(uintptr_t)1);
>=20
> I have patched malloc.c thus:
>=20
> > --- a/lib/libc/stdlib/malloc.c
> > +++ b/lib/libc/stdlib/malloc.c
> > @@ -1108,7 +1108,7 @@ static __thread arena_t           *arenas_map TLS=
_MODEL;
> > =20
> >  #ifdef MALLOC_TCACHE
> >  /* Map of thread-specific caches. */
> > -static __thread tcache_t       *tcache_tls TLS_MODEL;
> > +__thread tcache_t      *tcache_tls TLS_MODEL;
> > =20
> >  /*
> >   * Number of cache slots for each bin in the thread cache, or 0 if tca=
che
> >   * is
> > @@ -6184,10 +6184,17 @@ _malloc_thread_cleanup(void)
> >  #ifdef MALLOC_TCACHE
> >         tcache_t *tcache =3D tcache_tls;
> > =20
> > +        fprintf(stderr, "_m_t_c for %d:%lu with %p\n",=20
> > +               getpid(),
> > +               (unsigned long) _pthread_self(),
> > +               tcache);
> > +
> >         if (tcache !=3D NULL) {
> > -               assert(tcache !=3D (void *)(uintptr_t)1);
> > -               tcache_destroy(tcache);
> > -               tcache_tls =3D (void *)(uintptr_t)1;
> > +               /* assert(tcache !=3D (void *)(uintptr_t)1); */
> > +               if((uintptr_t)tcache !=3D (uintptr_t)1) {
> > +                       tcache_destroy(tcache);
> > +                       tcache_tls =3D (void *)(uintptr_t)1;
> > +               }
>=20
> and libthr/thread/thr_create.c thus:
>=20
> > --- a/lib/libthr/thread/thr_create.c
> > +++ b/lib/libthr/thread/thr_create.c
> > @@ -243,6 +243,8 @@ create_stack(struct pthread_attr *pattr)
> >         return (ret);
> >  }
> > =20
> > +extern __thread void *tcache_tls;
> > +
> >  static void
> >  thread_start(struct pthread *curthread)
> >  {
> > @@ -280,6 +282,11 @@ thread_start(struct pthread *curthread)
> >                 curthread->attr.stacksize_attr;
> >  #endif
> > =20
> > +        fprintf(stderr, "t_s for %d:%lu with %p\n",
> > +                getpid(),
> > +                (unsigned long) _pthread_self(),
> > +                tcache_tls);
> > +
> >         /* Run the current thread's start routine with argument: */
> >         _pthread_exit(curthread->start_routine(curthread->arg));
> > =20
>=20
> to attempt to debug this issue.  With those changes in place, bonnie++'s
> execution looks like this:
>=20
> >[...]
> > Writing a byte at a time...done
> > Writing intelligently...done
> > Rewriting...done
> > Reading a byte at a time...done
> > Reading intelligently...done
> > t_s for 79654:1086343168 with 0x0
> > t_s for 79654:1086345216 with 0x0
> > t_s for 79654:1086346240 with 0x0
> > t_s for 79654:1086347264 with 0x0
> > t_s for 79654:1086344192 with 0x0
> > start 'em...done...done...done...done..._m_t_c for 79654:1086344192 with
> > 0x41404400
> > _m_t_c for 79654:1086346240 with 0x40d2c400
> > _m_t_c for 79654:1086343168 with 0x41404200
> > _m_t_c for 79654:1086345216 with 0x41804200
> > done...
> > _m_t_c for 79654:1086347264 with 0x41004200
> > Create files in sequential order...done.
> > Stat files in sequential order...done.
> > Delete files in sequential order...done.
> > Create files in random order...done.
> > Stat files in random order...done.
> > Delete files in random order...done.
> > 1.96,1.96,hydra.priv.oc.ietfng.org,1,1308217772,10M,,7,81,2644,7,3577,1=
4,34,93,+++++,+++,773.7,61,16,,,
> > ,,2325,74,13016,99,2342,86,3019,91,11888,99,2184,89,16397ms,1237ms,671m=
s,2009ms,177us,1305ms,489ms,1029
> > us,270ms,140ms,53730us,250ms
> > Writing a byte at a time...done
> > Writing intelligently...done
> > Rewriting...done
> > Reading a byte at a time...done
> > Reading intelligently...done
> > t_s for 79654:1086343168 with 0x1
> > t_s for 79654:1086346240 with 0x1
> > t_s for 79654:1086345216 with 0x1
> > t_s for 79654:1086347264 with 0x1
> > t_s for 79654:1086344192 with 0x1
> > start 'em...done...done...done...done...done...
> > _m_t_c for 79654:1086347264 with 0x1
> > _m_t_c for 79654:1086344192 with 0x1
> > _m_t_c for 79654:1086343168 with 0x1
> >[...]
>=20
> So what seems to be happening is that the TLS area is being set up
> incorrectly, eventually: rather than zeroing the tcache_tls value, it is
> being set to 1, which means no tcache is ever allocated, so when we get
> around to exiting, the assert trips.
>=20
> Unfortunately, setting a breakpoint on __libc_allocate_tls seems to do bad
> things to the kernel (inducing a SIR without any panic message).  I am
> somewhat at a loss; help?
>=20
> Thanks in advance!
> --nwf;



--CaPKgh3XHpq3rEUV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk35tm8ACgkQTeQabvr9Tc9dUgCfdsS2tRyv7XQLe3ZIxtARb08g
ikwAnRJGMKWSWPd8KL0C4PzKAahNfjT0
=pMJ3
-----END PGP SIGNATURE-----

--CaPKgh3XHpq3rEUV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110616075319.GM31996>