Skip site navigation (1)Skip section navigation (2)
Date:      21 Oct 2002 22:31:48 +0200
From:      Linus Kendall <linus@angliaab.se>
To:        Peter Pentchev <roam@ringlet.net>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: PThreads problem
Message-ID:  <1035232308.24315.37.camel@bilbo>
In-Reply-To: <20021021194453.GB377@straylight.oblivion.bg>
References:  <1035200159.24315.13.camel@bilbo> <20021021124520.GS389@straylight.oblivion.bg> <1035206648.24315.20.camel@bilbo> <20021021134834.GA41198@straylight.oblivion.bg> <20021021135045.GB41198@straylight.oblivion.bg> <1035218026.24330.33.camel@bilbo>  <20021021194453.GB377@straylight.oblivion.bg>

next in thread | previous in thread | raw e-mail | index | archive | help
m=E5n 2002-10-21 klockan 21.44 skrev Peter Pentchev:
> On Mon, Oct 21, 2002 at 06:33:46PM +0200, Linus Kendall wrote:
> > Answer inline below.
> >=20
> > m?n 2002-10-21 klockan 15.50 skrev Peter Pentchev:
> > > On Mon, Oct 21, 2002 at 04:48:34PM +0300, Peter Pentchev wrote:
> > > > On Mon, Oct 21, 2002 at 03:24:08PM +0200, Linus Kendall wrote:
> > > > > m?n 2002-10-21 klockan 14.45 skrev Peter Pentchev:
> > > > > > On Mon, Oct 21, 2002 at 01:35:59PM +0200, Linus Kendall wrote:
> > > > > > > Hi,
> > > > > > >=20
> > > > > > > I'm trying to port a heavily threaded application from Linux =
(Debian
> > > > > > > 3.0, 2.4.19) to
> > > > > > > FreeBSD (4.6-RELEASE). The program compiles successfully usin=
g gcc with
> > > > > > > -pthreads. But, when I try to run the application I get the f=
ollowing
> > > > > > > error after a while (after spawning 11 threads):
> > > > > > >=20
> > > > > > > Fatal error 'siglongjmp()ing between thread contexts is undef=
ined by
> > > > > > > POSIX 1003.1' at line ? in file
> > > > > > > /usr/src/lib/libc_r/uthread/uthread_jmp.c (errno =3D ?)
> > > > > > > Abort trap - core dumped
> > > > > > >=20
> [snip]
> > > > This is interesting; can you produce a simple testcase?  If not, I =
will
> > > > be able to take a look at it some time later today or tomorrow, but=
 not
> > > > right now :(
> >=20
> > I'm not sure if I've really got time to produce a testcase. As I've
> > understood the main cause of the crash was that in *BSD the signals
> > are sent to each thread but in Linux they're sent to the process.
>=20
> Okay, I can see what the problem is; however, I have absolutely no idea
> how it is to be solved :(
>=20
> The DNS resolution routines of libcurl use alarm() as a timeout
> mechanism for the system DNS resolving functions.  To enforce the
> timeout even when the resolver functions are automatically restarted
> after the SIGALRM signal, libcurl attempts to set a jump buffer in the
> thread doing the DNS lookup, and to siglongjmp() to it from the SIGALRM
> handler.
>=20
> This works just fine on Linux, where each thread executes as a separate
> process; the signal is correctly delivered to the thread which invoked
> alarm(), and, consequently, exactly the one that set the jump buffer in
> the first place.
>=20
> On FreeBSD, however, the signal is delivered merely to the currently
> executing thread; if the resolver routines are currently in the process
> of sending or receiving data on a network socket, the currently
> executing thread may very well not be the one that has requested the
> resolving, and so siglongjmp() may be called from a thread which is NOT
> the one the jump buffer has been set in.  As the abort error message
> states, this is behavior not covered by any standards, and, I dare say,
> not very easy to implement at all, so it is currently unimplemented in
> FreeBSD.  For a standards reference, the SUSv2 siglongjmp() manpage at
> http://www.opengroup.org/onlinepubs/007908799/xsh/siglongjmp.html
> explicitly states at the end of the DESCRIPTION section:
>=20
>   The effect of a call to siglongjmp() where initialisation of the jmp_bu=
f
>   structure was not performed in the calling thread is undefined.
>=20
> > Blocking all signals resulted in an application which executed but
> > still I got problems with slow responses from libcurl
>=20
> As I understand it, the only reason for SIGALRM to make a difference
> would be a situation where a DNS query times out, at least by libcurl's
> standards.  Is your application trying to do such lookups?
>=20
> If anybody is interested, I am attaching a short proof-of-concept
> program which starts up two threads, then waits for a signal handler to
> hit.  If the longjmp() call is commented out, it displays the thread ID
> of the thread which received the signal - almost always the main thread,
> the one listed as 'me' in the list output at the program start, and most
> definitely not the last thread to call setjmp(), as that would be 't2'.
> If the longjmp() call is uncommented, the signal handler executing in
> the 'me' thread will longjmp() to a buffer initialized in the 't2'
> thread, and the program will abort with your error message with a 100%
> failure (or would that be success in proving the concept?) rate.
>=20
> People knowledgeable about threads: would there be a way to fix that
> problem?  I don't know.. something like examining the jump buffer, then
> activating the thread that is stored there, and resuming the currently
> executing thread at the point where it was interrupted by the signal?
> Without looking at the code, I can guess that most probably the answer
> would be a short burst of hysterical laughter :)  Still.. one may hope..
> :)

That was very thorough, thanks! Now I at least have a notion of what=20
is going on. Since this is slightly urgent I guess a hack into the
libcurl source code to try to remove the sigalarms would do the trick
(in my case). In the general case it seems like there's a rather big
problem here as libcurl's behavior cannot really work together with the
FreeBSD implementation of threads.

/Linus.

> G'luck,
> Peter
>=20
> --=20
> Peter Pentchev	roam@ringlet.net	roam@FreeBSD.org
> PGP key:	http://people.FreeBSD.org/~roam/roam.key.asc
> Key fingerprint	FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
> Hey, out there - is it *you* reading me, or is it someone else?
>=20
> #include <sys/types.h>
>=20
> #include <pthread.h>
> #include <setjmp.h>
> #include <signal.h>
> #include <stdio.h>
> #include <unistd.h>
>=20
> pthread_mutex_t	 mtxQ;
> int		 q[16];
> pthread_t	 tq[16];
> size_t		 qcnt;
> sigjmp_buf	 jmpbuf;
>=20
> static void
> sigalarm(int f)
> {
>=20
> 	pthread_mutex_lock(&mtxQ);
> 	q[qcnt] =3D f;
> 	tq[qcnt] =3D pthread_self();
> 	qcnt++;
> 	pthread_mutex_unlock(&mtxQ);
> //	siglongjmp(jmpbuf, 5);
> }
>=20
> static void *
> thr(void *arg)
> {
>=20
> 	sigsetjmp(jmpbuf, 0);
> 	sleep((int)arg);
> 	return (NULL);
> }
>=20
> int
> main(void)
> {
> 	pthread_t t1, t2;
> 	size_t i;
> 	struct sigaction sa;
>=20
> 	sigsetjmp(jmpbuf, 0);
> 	pthread_mutex_init(&mtxQ, NULL);
> 	printf("me =3D %ld\n", (long)pthread_self());
> 	pthread_create(&t1, NULL, thr, (void *)4);
> 	printf("t1 =3D %ld\n", (long)t1);
> 	pthread_create(&t2, NULL, thr, (void *)5);
> 	printf("t2 =3D %ld\n", (long)t2);
> 	memset(&sa, 0, sizeof(sa));
> 	sa.sa_handler =3D sigalarm;
> 	sigemptyset(&sa.sa_mask);
> 	sigaddset(&sa.sa_mask, SIGALRM);
> 	sigaction(SIGALRM, &sa, NULL);
> 	alarm(1);
> 	printf("qcnt =3D %u\n", qcnt);
> 	sleep(3);
> 	printf("qcnt =3D %u\n", qcnt);
> 	sleep(3);
> 	printf("qcnt =3D %u\n", qcnt);
> 	sleep(3);
> 	printf("qcnt =3D %u\n", qcnt);
> 	for (i =3D 0; i < qcnt; i++)
> 		printf("%2d\t%d\t%ld\n", i, q[i], (long)tq[i]);
> 	return (0);
> }


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1035232308.24315.37.camel>