Skip site navigation (1)Skip section navigation (2)
Date:      07 Aug 2003 22:21:57 +0100
From:      Peter Edwards <peter.edwards@openet-telecom.com>
To:        current@freebsd.org
Subject:   Fun with gdb and threads...
Message-ID:  <1060291316.64739.58.camel@rocklobster.openet-telecom.lan>

next in thread | raw e-mail | index | archive | help

--=-RRe50gU+xoc8NEVJj7Rx
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hi.
This might be of interest to anyone who has tried debugging
multi-threaded programs (of the libc_r variety) with gdb. This has been
bugging me for months, and I finally got frustrated enough to find out
what was going on.

The symptom:

Once you call any function that puts a thread to sleep, the target
process crashes (simple program, 1.c attached, and log of gdb killing it
in crash.txt)

The problem:

I traced this to an interaction between gdb and the threads scheduler.
The initial crash comes from gdb adding internal breakpoints in the
"(_)?(sig)?longjmp" functions. This breakpoint gets hit when the thread
scheduler calls "_thread_kern_sched"

After handling the breakpoint, gdb then needs to reset the instruction
pointer in the "current thread" to re-run the instruction the breakpoint
was at. However, at that point, gdb's freebsd_uthread_store_registers()
barfs, thinking that the thread in question is not "active", because its
not in state PS_RUNNING (it's just about to go to sleep). As a result,
it mucks up the resetting of the instruction pointer, because it thinks
it just needs to twiddle with the threads context, rather than the
"live" registers.

Once the process is resumed, it starts in the middle of whatever
instruction the breakpoint overwrote, and generally fscking things up.

The fix:

I added a couple of "nop"s to  "___longjmp", and created a new
entrypoint below them called "___longjmp_raw". This provides a way for
the libc_r library to avoid hitting the gdb breakpoints at sensitive
moments. All other consumers still work the exact same way (modulo the
time spent executing a couple of nops). The patch is attached, and makes
gdb behave perfectly for me.

Does anyone have any comments on this, or ideas on how to improve on it?
The only penalty I can see is an extra "nop" instruction for normal
longjmps, which I'll gladly trade for a usable debugger.

PS:

before anyone suggests it, I initially tried changing freebsd_uthread.c
to check for the active thread more effectively, as is done in
freebsd_uthread_fetch_registers, by comparing it with "_pthread_run",
rather than checking the state.

This improved things, but gdb still got confused, and started stopping
unexpectedly when it lost it's breakpoints, etc, so I figured the other
approach was probably going to be more stable.


--=-RRe50gU+xoc8NEVJj7Rx
Content-Disposition: attachment; filename=patch.txt
Content-Type: text/x-patch; name=patch.txt; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit

Index: lib/libc/i386/gen/_setjmp.S
===================================================================
RCS file: /pub/FreeBSD/development/FreeBSD-CVS/src/lib/libc/i386/gen/_setjmp.S,v
retrieving revision 1.16
diff -u -r1.16 _setjmp.S
--- lib/libc/i386/gen/_setjmp.S	23 Mar 2002 02:05:17 -0000	1.16
+++ lib/libc/i386/gen/_setjmp.S	7 Aug 2003 20:42:08 -0000
@@ -66,6 +66,17 @@
 	.weak	CNAME(_longjmp)
 	.set	CNAME(_longjmp),CNAME(___longjmp)
 ENTRY(___longjmp)
+/*
+ * Debuggers tend to put breakpoints in longjmp, while
+ * threads libraries don't like to be interrupted.
+ * The extra nop for the exposed "_longjmp" stops
+ * ___longjmp getting mucked about with by the debugger
+ * The threads library can then call ___longjmp_raw
+ * with impunity.
+ */
+	nop
+	nop
+ENTRY(___longjmp_raw)
 	movl	4(%esp),%edx
 	movl	8(%esp),%eax
 	movl	0(%edx),%ecx
Index: lib/libc_r/uthread/uthread_kern.c
===================================================================
RCS file: /pub/FreeBSD/development/FreeBSD-CVS/src/lib/libc_r/uthread/uthread_kern.c,v
retrieving revision 1.45
diff -u -r1.45 uthread_kern.c
--- lib/libc_r/uthread/uthread_kern.c	5 Oct 2002 02:22:26 -0000	1.45
+++ lib/libc_r/uthread/uthread_kern.c	7 Aug 2003 20:39:44 -0000
@@ -95,7 +95,7 @@
 	curthread->check_pending = 1;
 
 	/* Switch to the thread scheduler: */
-	___longjmp(_thread_kern_sched_jb, 1);
+	___longjmp_raw(_thread_kern_sched_jb, 1);
 }
 
 
@@ -165,7 +165,7 @@
 		}
 	}
 	/* Switch to the thread scheduler: */
-	___longjmp(_thread_kern_sched_jb, 1);
+	___longjmp_raw(_thread_kern_sched_jb, 1);
 }
 
 void
@@ -582,7 +582,7 @@
 #if NOT_YET
 			_setcontext(&curthread->ctx.uc);
 #else
-			___longjmp(curthread->ctx.jb, 1);
+			___longjmp_raw(curthread->ctx.jb, 1);
 #endif
 			/* This point should not be reached. */
 			PANIC("Thread has returned from sigreturn or longjmp");

--=-RRe50gU+xoc8NEVJj7Rx
Content-Disposition: attachment; filename=crash.txt
Content-Type: text/plain; name=crash.txt; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit

petere@rocklobster$ gcc -o 1 -g -Wall -pthread 1.c
petere@rocklobster$ gdb ./1    
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-undermydesk-freebsd"...
(gdb) b threadFunc
Breakpoint 1 at 0x804861e: file 1.c, line 10.
(gdb) run
Starting program: /local/petere/1 

Breakpoint 1, threadFunc (arg=0x0) at 1.c:10
10          sleep(1);
(gdb) n

Program received signal SIGSEGV, Segmentation fault.
0x280d0138 in _longjmp () from /usr/lib/libc.so.5
(gdb) 

--=-RRe50gU+xoc8NEVJj7Rx--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1060291316.64739.58.camel>