From owner-freebsd-current  Wed Nov  1 11: 9:37 2000
Delivered-To: freebsd-current@freebsd.org
Received: from blizzard.sabbo.net (blizzard.sabbo.net [193.193.218.18])
	by hub.freebsd.org (Postfix) with ESMTP
	id 3735137B479; Wed,  1 Nov 2000 11:09:28 -0800 (PST)
Received: from vic.sabbo.net (root@vic.sabbo.net [193.193.218.109])
	by blizzard.sabbo.net (8.10.1/8.10.1) with ESMTP id eA1KBCt05948;
	Wed, 1 Nov 2000 22:11:15 +0200
Received: from FreeBSD.org (big_brother.vega.com [192.168.1.1])
	by vic.sabbo.net (8.11.0/8.9.3) with ESMTP id eA1J9MM16667;
	Wed, 1 Nov 2000 21:09:22 +0200 (EET)
	(envelope-from sobomax@FreeBSD.org)
Message-ID: <3A006A58.E8315ABA@FreeBSD.org>
Date: Wed, 01 Nov 2000 21:09:12 +0200
From: Maxim Sobolev <sobomax@FreeBSD.org>
Organization: Vega International Capital
X-Mailer: Mozilla 4.76 [en] (WinNT; U)
X-Accept-Language: uk,ru,en
MIME-Version: 1.0
To: John Polstra <jdp@polstra.com>
Cc: current@FreeBSD.org, obrien@FreeBSD.org, deischen@FreeBSD.org
Subject: Re: ABI is broken??
References: <3A005026.47B9978C@FreeBSD.org> <200011011835.eA1IZl207585@vashon.polstra.com>
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

John Polstra wrote:

> In article <3A005026.47B9978C@FreeBSD.org>,
> Maxim Sobolev  <sobomax@FreeBSD.ORG> wrote:
> >
> > I'm not sure what exactly caused this behaviour (I can guess two potential
> > victims: O'Brien's changes in crt stuff and recent Polstra's changes in
> > libgcc_r), but it seems that some programs built on the previous -current from
> > 27 October immediately segfault when I'm trying to run then on system installed
> > from today's sources. The segfault disappeared when I recompiled affected
> > program. With this message I'm attaching short backtrace.
> [...]
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4
> > (gdb) bt
> > #0  0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4
> > #1  0x806e782 in __register_frame_info ()
> > #2  0x287a3137 in _init () from /usr/lib/libc_r.so.4
> > #3  0x2879ffe5 in _init () from /usr/lib/libc_r.so.4
> > #4  0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1
>
> Here are all the random facts which, when put together, explain what
> is going on.
>
> Your old application was (like all -pthread programs) linked
> with "/usr/lib/libgcc_r.a".  That library contains a function
> "__register_frame_info" which uses some of the facilities of the
> pthreads library "libc_r".
>
> The pthreads library has to be initialized before it can be used, by
> a call to _thread_init.  If some functions such as pthread_mutex_lock
> are called before the library has been initialized, a segmentation
> violation results.
>
> _thread_init is called automatically from libc_r's _init function
> when the dynamic linker loads the library.  Unfortunately, that
> isn't early enough.  libgcc_r is the first thing to be initialized,
> and it calls pthread_mutex_lock before _thread_init has been called.
> Or rather I should say that OLD versions of libgcc_r did that --
> because they were buggy.
>
> In other words, your old application was linked with a buggy version
> of libgcc_r, but it didn't become apparent until now.
>
> It didn't become apparent until now because our crtbegin.o and
> crtend.o were also buggy.  They failed to call __register_frame_info.
> This was a problem for C++ programs using exceptions, especially when
> the gcc port was used and DWARF2 exception handling was selected.
>
> Now we have fixed crtbegin.o and crtend.o, and we have fixed
> libgcc_r.a.  But it causes problems for your old application because
> the new crtbegin.o and crtend.o (linked into the new shared libraries
> such as libc_r) call __register_frame_info in your old, buggy,
> statically linked libgcc_r.a.
>
> Are you dizzy yet?  To sum up, your old executable contains the bug but
> it wasn't triggered until the recent changes.
>
> Now, what can or should we do about this?  Arguably we should simply
> say in the release notes, "Relink your old multithreaded applications.
> They had a bug which is now fixed."  But if there are binary-only
> commercial apps which exhibit the problem, this solution is useless.
> I don't know whether there are any such apps, but I doubt it.  N.B.,
> Linux apps don't count because they were never linked with our
> libgcc_r in the first place.
>
> Or we can try to work around it, but there aren't any perfectly nice
> ways to do so.  Here are some possibilities:
>
> - Put a hack in the threads library so that whenever
>   pthread_mutex_lock is called it checks to make sure that the
>   threads library has been initialized, and if not, it calls
>   _thread_init.  This is a poor solution because it adds overhead to
>   a rather performance-critical function -- though admittedly the
>   overhead is very small.  Another potential problem is that there
>   could be a race condition if several threads all called
>   pthread_mutex_lock at once before the threads library had been
>   initialized.  I don't think the race condition would materialize,
>   though, since the first call would come from libgcc_r, well before
>   the application had gotten control.
>
> - Put a hack into the dynamic linker to call _thread_init very early
>   if that symbol was defined.  I like this solution even less,
>   because it's too hackish.  The dynamic linker isn't the place for
>   special hooks like that.
>
> - Put a hack into crtbegin.o or crtend.o.  But we are using the
>   standard GNU versions of these, and I really really don't want to
>   change that.  In any case, it's the wrong place for the
>   work-around.
>
> Overall I would lean toward putting the hack into pthread_mutex_lock.
> Comments?

Huh, why we can't just bump libc_r version number and put older (buggy) version into
lib/compat as usually? This would not require any ugly hacks at all.

-Maxim


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message