From owner-freebsd-sparc64@FreeBSD.ORG Thu Jun 16 07:53:20 2011 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 570B0106566B; Thu, 16 Jun 2011 07:53:20 +0000 (UTC) (envelope-from nwf@cs.jhu.edu) Received: from blaze.cs.jhu.edu (blaze.cs.jhu.edu [128.220.13.50]) by mx1.freebsd.org (Postfix) with ESMTP id 11E828FC1C; Thu, 16 Jun 2011 07:53:19 +0000 (UTC) Received: from gradx.cs.jhu.edu (gradx.cs.jhu.edu [128.220.13.52]) by blaze.cs.jhu.edu (8.14.3/8.14.3) with ESMTP id p5G7rJn8005689 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 16 Jun 2011 03:53:19 -0400 (EDT) Received: from gradx.cs.jhu.edu (localhost [127.0.0.1]) by gradx.cs.jhu.edu (8.14.3/8.13.1) with ESMTP id p5G7rJ9D023154; Thu, 16 Jun 2011 03:53:19 -0400 Received: (from nwf@localhost) by gradx.cs.jhu.edu (8.14.3/8.13.8/Submit) id p5G7rJvC023153; Thu, 16 Jun 2011 03:53:19 -0400 Date: Thu, 16 Jun 2011 03:53:19 -0400 From: Nathaniel W Filardo To: freebsd-current@freebsd.org, freebsd-sparc64@freebsd.org Message-ID: <20110616075319.GM31996@gradx.cs.jhu.edu> References: <20110616073138.GL31996@gradx.cs.jhu.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CaPKgh3XHpq3rEUV" Content-Disposition: inline In-Reply-To: <20110616073138.GL31996@gradx.cs.jhu.edu> User-Agent: Mutt/1.5.20 (2009-08-17) Cc: Subject: Re: TLS bug? X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Jun 2011 07:53:20 -0000 --CaPKgh3XHpq3rEUV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Atcht; it's late. I forgot to mention that this system is a sparc64 V240 2-way SMP machine. It's running a kernel from 9.0-CURRENT r222833+262af52: Tue Jun 7 18:47:35 EDT 2011 and a userland from a little later. Sorry about that. --nwf; On Thu, Jun 16, 2011 at 03:31:38AM -0400, Nathaniel W Filardo wrote: > I have a few applications (bonnie++ and mysql, specifically, both from > ports) which trip over the assertion in > lib/libc/stdlib/malloc.c:/^_malloc_thread_cleanup that > > assert(tcache !=3D (void *)(uintptr_t)1); >=20 > I have patched malloc.c thus: >=20 > > --- a/lib/libc/stdlib/malloc.c > > +++ b/lib/libc/stdlib/malloc.c > > @@ -1108,7 +1108,7 @@ static __thread arena_t *arenas_map TLS= _MODEL; > > =20 > > #ifdef MALLOC_TCACHE > > /* Map of thread-specific caches. */ > > -static __thread tcache_t *tcache_tls TLS_MODEL; > > +__thread tcache_t *tcache_tls TLS_MODEL; > > =20 > > /* > > * Number of cache slots for each bin in the thread cache, or 0 if tca= che > > * is > > @@ -6184,10 +6184,17 @@ _malloc_thread_cleanup(void) > > #ifdef MALLOC_TCACHE > > tcache_t *tcache =3D tcache_tls; > > =20 > > + fprintf(stderr, "_m_t_c for %d:%lu with %p\n",=20 > > + getpid(), > > + (unsigned long) _pthread_self(), > > + tcache); > > + > > if (tcache !=3D NULL) { > > - assert(tcache !=3D (void *)(uintptr_t)1); > > - tcache_destroy(tcache); > > - tcache_tls =3D (void *)(uintptr_t)1; > > + /* assert(tcache !=3D (void *)(uintptr_t)1); */ > > + if((uintptr_t)tcache !=3D (uintptr_t)1) { > > + tcache_destroy(tcache); > > + tcache_tls =3D (void *)(uintptr_t)1; > > + } >=20 > and libthr/thread/thr_create.c thus: >=20 > > --- a/lib/libthr/thread/thr_create.c > > +++ b/lib/libthr/thread/thr_create.c > > @@ -243,6 +243,8 @@ create_stack(struct pthread_attr *pattr) > > return (ret); > > } > > =20 > > +extern __thread void *tcache_tls; > > + > > static void > > thread_start(struct pthread *curthread) > > { > > @@ -280,6 +282,11 @@ thread_start(struct pthread *curthread) > > curthread->attr.stacksize_attr; > > #endif > > =20 > > + fprintf(stderr, "t_s for %d:%lu with %p\n", > > + getpid(), > > + (unsigned long) _pthread_self(), > > + tcache_tls); > > + > > /* Run the current thread's start routine with argument: */ > > _pthread_exit(curthread->start_routine(curthread->arg)); > > =20 >=20 > to attempt to debug this issue. With those changes in place, bonnie++'s > execution looks like this: >=20 > >[...] > > Writing a byte at a time...done > > Writing intelligently...done > > Rewriting...done > > Reading a byte at a time...done > > Reading intelligently...done > > t_s for 79654:1086343168 with 0x0 > > t_s for 79654:1086345216 with 0x0 > > t_s for 79654:1086346240 with 0x0 > > t_s for 79654:1086347264 with 0x0 > > t_s for 79654:1086344192 with 0x0 > > start 'em...done...done...done...done..._m_t_c for 79654:1086344192 with > > 0x41404400 > > _m_t_c for 79654:1086346240 with 0x40d2c400 > > _m_t_c for 79654:1086343168 with 0x41404200 > > _m_t_c for 79654:1086345216 with 0x41804200 > > done... > > _m_t_c for 79654:1086347264 with 0x41004200 > > Create files in sequential order...done. > > Stat files in sequential order...done. > > Delete files in sequential order...done. > > Create files in random order...done. > > Stat files in random order...done. > > Delete files in random order...done. > > 1.96,1.96,hydra.priv.oc.ietfng.org,1,1308217772,10M,,7,81,2644,7,3577,1= 4,34,93,+++++,+++,773.7,61,16,,, > > ,,2325,74,13016,99,2342,86,3019,91,11888,99,2184,89,16397ms,1237ms,671m= s,2009ms,177us,1305ms,489ms,1029 > > us,270ms,140ms,53730us,250ms > > Writing a byte at a time...done > > Writing intelligently...done > > Rewriting...done > > Reading a byte at a time...done > > Reading intelligently...done > > t_s for 79654:1086343168 with 0x1 > > t_s for 79654:1086346240 with 0x1 > > t_s for 79654:1086345216 with 0x1 > > t_s for 79654:1086347264 with 0x1 > > t_s for 79654:1086344192 with 0x1 > > start 'em...done...done...done...done...done... > > _m_t_c for 79654:1086347264 with 0x1 > > _m_t_c for 79654:1086344192 with 0x1 > > _m_t_c for 79654:1086343168 with 0x1 > >[...] >=20 > So what seems to be happening is that the TLS area is being set up > incorrectly, eventually: rather than zeroing the tcache_tls value, it is > being set to 1, which means no tcache is ever allocated, so when we get > around to exiting, the assert trips. >=20 > Unfortunately, setting a breakpoint on __libc_allocate_tls seems to do bad > things to the kernel (inducing a SIR without any panic message). I am > somewhat at a loss; help? >=20 > Thanks in advance! > --nwf; --CaPKgh3XHpq3rEUV Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk35tm8ACgkQTeQabvr9Tc9dUgCfdsS2tRyv7XQLe3ZIxtARb08g ikwAnRJGMKWSWPd8KL0C4PzKAahNfjT0 =pMJ3 -----END PGP SIGNATURE----- --CaPKgh3XHpq3rEUV--