From owner-freebsd-hackers Mon Oct 21 13:37:50 2002 Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 436A837B401 for ; Mon, 21 Oct 2002 13:37:44 -0700 (PDT) Received: from mailc.telia.com (mailc.telia.com [194.22.190.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id AD43F43E6E for ; Mon, 21 Oct 2002 13:37:37 -0700 (PDT) (envelope-from linus@angliaab.se) Received: from d1o927.telia.com (d1o927.telia.com [213.65.200.241]) by mailc.telia.com (8.12.5/8.12.5) with ESMTP id g9LKbaq6025399; Mon, 21 Oct 2002 22:37:36 +0200 (CEST) X-Original-Recipient: freebsd-hackers@FreeBSD.ORG Received: from bilbo (h87n2fls33o927.telia.com [213.65.39.87]) by d1o927.telia.com (8.10.2/8.10.1) with SMTP id g9LKbac07535; Mon, 21 Oct 2002 22:37:36 +0200 (CEST) Received: by bilbo (sSMTP sendmail emulation); Mon, 21 Oct 2002 22:31:49 +0200 Subject: Re: PThreads problem From: Linus Kendall To: Peter Pentchev Cc: freebsd-hackers@FreeBSD.ORG In-Reply-To: <20021021194453.GB377@straylight.oblivion.bg> References: <1035200159.24315.13.camel@bilbo> <20021021124520.GS389@straylight.oblivion.bg> <1035206648.24315.20.camel@bilbo> <20021021134834.GA41198@straylight.oblivion.bg> <20021021135045.GB41198@straylight.oblivion.bg> <1035218026.24330.33.camel@bilbo> <20021021194453.GB377@straylight.oblivion.bg> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailer: Ximian Evolution 1.0.8 Date: 21 Oct 2002 22:31:48 +0200 Message-Id: <1035232308.24315.37.camel@bilbo> Mime-Version: 1.0 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG m=E5n 2002-10-21 klockan 21.44 skrev Peter Pentchev: > On Mon, Oct 21, 2002 at 06:33:46PM +0200, Linus Kendall wrote: > > Answer inline below. > >=20 > > m?n 2002-10-21 klockan 15.50 skrev Peter Pentchev: > > > On Mon, Oct 21, 2002 at 04:48:34PM +0300, Peter Pentchev wrote: > > > > On Mon, Oct 21, 2002 at 03:24:08PM +0200, Linus Kendall wrote: > > > > > m?n 2002-10-21 klockan 14.45 skrev Peter Pentchev: > > > > > > On Mon, Oct 21, 2002 at 01:35:59PM +0200, Linus Kendall wrote: > > > > > > > Hi, > > > > > > >=20 > > > > > > > I'm trying to port a heavily threaded application from Linux = (Debian > > > > > > > 3.0, 2.4.19) to > > > > > > > FreeBSD (4.6-RELEASE). The program compiles successfully usin= g gcc with > > > > > > > -pthreads. But, when I try to run the application I get the f= ollowing > > > > > > > error after a while (after spawning 11 threads): > > > > > > >=20 > > > > > > > Fatal error 'siglongjmp()ing between thread contexts is undef= ined by > > > > > > > POSIX 1003.1' at line ? in file > > > > > > > /usr/src/lib/libc_r/uthread/uthread_jmp.c (errno =3D ?) > > > > > > > Abort trap - core dumped > > > > > > >=20 > [snip] > > > > This is interesting; can you produce a simple testcase? If not, I = will > > > > be able to take a look at it some time later today or tomorrow, but= not > > > > right now :( > >=20 > > I'm not sure if I've really got time to produce a testcase. As I've > > understood the main cause of the crash was that in *BSD the signals > > are sent to each thread but in Linux they're sent to the process. >=20 > Okay, I can see what the problem is; however, I have absolutely no idea > how it is to be solved :( >=20 > The DNS resolution routines of libcurl use alarm() as a timeout > mechanism for the system DNS resolving functions. To enforce the > timeout even when the resolver functions are automatically restarted > after the SIGALRM signal, libcurl attempts to set a jump buffer in the > thread doing the DNS lookup, and to siglongjmp() to it from the SIGALRM > handler. >=20 > This works just fine on Linux, where each thread executes as a separate > process; the signal is correctly delivered to the thread which invoked > alarm(), and, consequently, exactly the one that set the jump buffer in > the first place. >=20 > On FreeBSD, however, the signal is delivered merely to the currently > executing thread; if the resolver routines are currently in the process > of sending or receiving data on a network socket, the currently > executing thread may very well not be the one that has requested the > resolving, and so siglongjmp() may be called from a thread which is NOT > the one the jump buffer has been set in. As the abort error message > states, this is behavior not covered by any standards, and, I dare say, > not very easy to implement at all, so it is currently unimplemented in > FreeBSD. For a standards reference, the SUSv2 siglongjmp() manpage at > http://www.opengroup.org/onlinepubs/007908799/xsh/siglongjmp.html > explicitly states at the end of the DESCRIPTION section: >=20 > The effect of a call to siglongjmp() where initialisation of the jmp_bu= f > structure was not performed in the calling thread is undefined. >=20 > > Blocking all signals resulted in an application which executed but > > still I got problems with slow responses from libcurl >=20 > As I understand it, the only reason for SIGALRM to make a difference > would be a situation where a DNS query times out, at least by libcurl's > standards. Is your application trying to do such lookups? >=20 > If anybody is interested, I am attaching a short proof-of-concept > program which starts up two threads, then waits for a signal handler to > hit. If the longjmp() call is commented out, it displays the thread ID > of the thread which received the signal - almost always the main thread, > the one listed as 'me' in the list output at the program start, and most > definitely not the last thread to call setjmp(), as that would be 't2'. > If the longjmp() call is uncommented, the signal handler executing in > the 'me' thread will longjmp() to a buffer initialized in the 't2' > thread, and the program will abort with your error message with a 100% > failure (or would that be success in proving the concept?) rate. >=20 > People knowledgeable about threads: would there be a way to fix that > problem? I don't know.. something like examining the jump buffer, then > activating the thread that is stored there, and resuming the currently > executing thread at the point where it was interrupted by the signal? > Without looking at the code, I can guess that most probably the answer > would be a short burst of hysterical laughter :) Still.. one may hope.. > :) That was very thorough, thanks! Now I at least have a notion of what=20 is going on. Since this is slightly urgent I guess a hack into the libcurl source code to try to remove the sigalarms would do the trick (in my case). In the general case it seems like there's a rather big problem here as libcurl's behavior cannot really work together with the FreeBSD implementation of threads. /Linus. > G'luck, > Peter >=20 > --=20 > Peter Pentchev roam@ringlet.net roam@FreeBSD.org > PGP key: http://people.FreeBSD.org/~roam/roam.key.asc > Key fingerprint FDBA FD79 C26F 3C51 C95E DF9E ED18 B68D 1619 4553 > Hey, out there - is it *you* reading me, or is it someone else? >=20 > #include >=20 > #include > #include > #include > #include > #include >=20 > pthread_mutex_t mtxQ; > int q[16]; > pthread_t tq[16]; > size_t qcnt; > sigjmp_buf jmpbuf; >=20 > static void > sigalarm(int f) > { >=20 > pthread_mutex_lock(&mtxQ); > q[qcnt] =3D f; > tq[qcnt] =3D pthread_self(); > qcnt++; > pthread_mutex_unlock(&mtxQ); > // siglongjmp(jmpbuf, 5); > } >=20 > static void * > thr(void *arg) > { >=20 > sigsetjmp(jmpbuf, 0); > sleep((int)arg); > return (NULL); > } >=20 > int > main(void) > { > pthread_t t1, t2; > size_t i; > struct sigaction sa; >=20 > sigsetjmp(jmpbuf, 0); > pthread_mutex_init(&mtxQ, NULL); > printf("me =3D %ld\n", (long)pthread_self()); > pthread_create(&t1, NULL, thr, (void *)4); > printf("t1 =3D %ld\n", (long)t1); > pthread_create(&t2, NULL, thr, (void *)5); > printf("t2 =3D %ld\n", (long)t2); > memset(&sa, 0, sizeof(sa)); > sa.sa_handler =3D sigalarm; > sigemptyset(&sa.sa_mask); > sigaddset(&sa.sa_mask, SIGALRM); > sigaction(SIGALRM, &sa, NULL); > alarm(1); > printf("qcnt =3D %u\n", qcnt); > sleep(3); > printf("qcnt =3D %u\n", qcnt); > sleep(3); > printf("qcnt =3D %u\n", qcnt); > sleep(3); > printf("qcnt =3D %u\n", qcnt); > for (i =3D 0; i < qcnt; i++) > printf("%2d\t%d\t%ld\n", i, q[i], (long)tq[i]); > return (0); > } To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message