From owner-freebsd-current Wed Nov 1 11: 9:37 2000 Delivered-To: freebsd-current@freebsd.org Received: from blizzard.sabbo.net (blizzard.sabbo.net [193.193.218.18]) by hub.freebsd.org (Postfix) with ESMTP id 3735137B479; Wed, 1 Nov 2000 11:09:28 -0800 (PST) Received: from vic.sabbo.net (root@vic.sabbo.net [193.193.218.109]) by blizzard.sabbo.net (8.10.1/8.10.1) with ESMTP id eA1KBCt05948; Wed, 1 Nov 2000 22:11:15 +0200 Received: from FreeBSD.org (big_brother.vega.com [192.168.1.1]) by vic.sabbo.net (8.11.0/8.9.3) with ESMTP id eA1J9MM16667; Wed, 1 Nov 2000 21:09:22 +0200 (EET) (envelope-from sobomax@FreeBSD.org) Message-ID: <3A006A58.E8315ABA@FreeBSD.org> Date: Wed, 01 Nov 2000 21:09:12 +0200 From: Maxim Sobolev Organization: Vega International Capital X-Mailer: Mozilla 4.76 [en] (WinNT; U) X-Accept-Language: uk,ru,en MIME-Version: 1.0 To: John Polstra Cc: current@FreeBSD.org, obrien@FreeBSD.org, deischen@FreeBSD.org Subject: Re: ABI is broken?? References: <3A005026.47B9978C@FreeBSD.org> <200011011835.eA1IZl207585@vashon.polstra.com> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG John Polstra wrote: > In article <3A005026.47B9978C@FreeBSD.org>, > Maxim Sobolev wrote: > > > > I'm not sure what exactly caused this behaviour (I can guess two potential > > victims: O'Brien's changes in crt stuff and recent Polstra's changes in > > libgcc_r), but it seems that some programs built on the previous -current from > > 27 October immediately segfault when I'm trying to run then on system installed > > from today's sources. The segfault disappeared when I recompiled affected > > program. With this message I'm attaching short backtrace. > [...] > > Program received signal SIGSEGV, Segmentation fault. > > 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 > > (gdb) bt > > #0 0x287de417 in pthread_mutex_lock () from /usr/lib/libc_r.so.4 > > #1 0x806e782 in __register_frame_info () > > #2 0x287a3137 in _init () from /usr/lib/libc_r.so.4 > > #3 0x2879ffe5 in _init () from /usr/lib/libc_r.so.4 > > #4 0x280797fd in _rtld () from /usr/libexec/ld-elf.so.1 > > Here are all the random facts which, when put together, explain what > is going on. > > Your old application was (like all -pthread programs) linked > with "/usr/lib/libgcc_r.a". That library contains a function > "__register_frame_info" which uses some of the facilities of the > pthreads library "libc_r". > > The pthreads library has to be initialized before it can be used, by > a call to _thread_init. If some functions such as pthread_mutex_lock > are called before the library has been initialized, a segmentation > violation results. > > _thread_init is called automatically from libc_r's _init function > when the dynamic linker loads the library. Unfortunately, that > isn't early enough. libgcc_r is the first thing to be initialized, > and it calls pthread_mutex_lock before _thread_init has been called. > Or rather I should say that OLD versions of libgcc_r did that -- > because they were buggy. > > In other words, your old application was linked with a buggy version > of libgcc_r, but it didn't become apparent until now. > > It didn't become apparent until now because our crtbegin.o and > crtend.o were also buggy. They failed to call __register_frame_info. > This was a problem for C++ programs using exceptions, especially when > the gcc port was used and DWARF2 exception handling was selected. > > Now we have fixed crtbegin.o and crtend.o, and we have fixed > libgcc_r.a. But it causes problems for your old application because > the new crtbegin.o and crtend.o (linked into the new shared libraries > such as libc_r) call __register_frame_info in your old, buggy, > statically linked libgcc_r.a. > > Are you dizzy yet? To sum up, your old executable contains the bug but > it wasn't triggered until the recent changes. > > Now, what can or should we do about this? Arguably we should simply > say in the release notes, "Relink your old multithreaded applications. > They had a bug which is now fixed." But if there are binary-only > commercial apps which exhibit the problem, this solution is useless. > I don't know whether there are any such apps, but I doubt it. N.B., > Linux apps don't count because they were never linked with our > libgcc_r in the first place. > > Or we can try to work around it, but there aren't any perfectly nice > ways to do so. Here are some possibilities: > > - Put a hack in the threads library so that whenever > pthread_mutex_lock is called it checks to make sure that the > threads library has been initialized, and if not, it calls > _thread_init. This is a poor solution because it adds overhead to > a rather performance-critical function -- though admittedly the > overhead is very small. Another potential problem is that there > could be a race condition if several threads all called > pthread_mutex_lock at once before the threads library had been > initialized. I don't think the race condition would materialize, > though, since the first call would come from libgcc_r, well before > the application had gotten control. > > - Put a hack into the dynamic linker to call _thread_init very early > if that symbol was defined. I like this solution even less, > because it's too hackish. The dynamic linker isn't the place for > special hooks like that. > > - Put a hack into crtbegin.o or crtend.o. But we are using the > standard GNU versions of these, and I really really don't want to > change that. In any case, it's the wrong place for the > work-around. > > Overall I would lean toward putting the hack into pthread_mutex_lock. > Comments? Huh, why we can't just bump libc_r version number and put older (buggy) version into lib/compat as usually? This would not require any ugly hacks at all. -Maxim To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message