Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Oct 2004 16:56:25 -0400
From:      John Baldwin <jhb@FreeBSD.org>
To:        Daniel Eischen <deischen@FreeBSD.org>
Cc:        threads@FreeBSD.org
Subject:   Re: Infinite loop bug in libc_r on 4.x with condition variables and signals
Message-ID:  <200410291656.25609.jhb@FreeBSD.org>
In-Reply-To: <Pine.GSO.4.43.0410281825210.5783-100000@sea.ntplx.net>
References:  <Pine.GSO.4.43.0410281825210.5783-100000@sea.ntplx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 28 October 2004 06:43 pm, Daniel Eischen wrote:
> On Thu, 28 Oct 2004, Daniel Eischen wrote:
> > On Thu, 28 Oct 2004, John Baldwin wrote:
> > > On Wednesday 27 October 2004 06:30 pm, Daniel Eischen wrote:
> > > > On Wed, 27 Oct 2004, John Baldwin wrote:
> > > > > FWIW, we are having (I think) the same problem on 5.3 with
> > > > > libpthread. The panic there is in the mutex code about an assertion
> > > > > failing because a thread is on a syncq when it is not supposed to
> > > > > be.
> > > >
> > > > David and I recently fixed some races in pthread_join() and
> > > > pthread_exit() in -current libpthread.  Don't know if those
> > > > were responsible...
> > > >
> > > > Here's a test program that shows correct behavior with both
> > > > libc_r and libpthread in -current.
> > >
> > > We've started testing on -current and are seeing several problems with
> > > libpthread.  Using a UP kernel (machines have single processor with
> > > HTT) seems to make it better, but we seem to be getting SIG 11's in
> > > pthread_testcancel() as well as the failed lock assertions that were
> > > mentioned earlier on the list in the PR.  Just running monodevelop from
> > > the bsd-sharp stuff mentioned earlier can break in that one of the
> > > processes dies with the assertion failure.  If you let the other
> > > processes run, then you can run it again and get the window to pop up,
> > > but then clicking on any of the controls results in the
> > > pthread_testcancel() crash.  FWIW, I think the reason that the stack
> > > traces look weird in the PR's thread may be due to catching a signal. 
> > > When we were looking at the problems with libc_r on 4.x we would get
> > > some weird looking backtraces sometimes when the assertion in
> > > uthread_sig.c that I added failed.  Seems that gdb doesn't handle the
> > > signal frames very well.
> >
> > You also want to make sure you're not running out of stack space
> > for your threads.
> >
> > Is the code trying to install signal frames on threads itself?
> > That could cause the problems you are seeing.
>
> I went back to the monodoc test case in the PR.  Running under
> the debugger gives this:
>
> (gdb) run /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe ./browser.cs
> ./list.cs               ./elabel.cs             ./history.cs
> ./Contributions.cs      ./XmlNodeWriter.cs
> -resource:./../monodoc.png,monodoc.png
> -resource:./browser.glade,browser.glade -pkg:gtkhtml-sharp -pkg:glade-sharp
> -r:System.Web.Services -r:./monodoc.dll Starting program:
> /usr/local/bin/mono /usr/local/lib/mono/1.0/mcs.exe -out:browser.exe
> ./browser.cs                 ./list.cs
> ./elabel.cs             ./history.cs             ./Contributions.cs
> ./XmlNodeWriter.cs -resource:./../monodoc.png,monodoc.png
> -resource:./browser.glade,browser.glade  -pkg:gtkhtml-sharp
> -pkg:glade-sharp -r:System.Web.Services -r:./monodoc.dll
> [Switching to Thread 1 (LWP 100074)]
>
> Breakpoint 1, 0x0804862e in main ()
> (gdb) cont
> Continuing.
> [Switching to Thread 4 (LWP 100128)]
>
> Breakpoint 2, 0x2842c801 in __assert () from /lib/libc.so.5
> (gdb) bt
> #0  0x2842c801 in __assert () from /lib/libc.so.5
> #1  0x2837ce4e in _lock_acquire (lck=0x8062f00, lu=0x8110e48,
> prio=674751930) at /opt/FreeBSD/src/lib/libpthread/sys/lock.c:171
> #2  0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434,
> abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495
> #3  0x28371677 in __pthread_mutex_lock (m=0x28482434)
>     at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:796
> #4  0x28171cc6 in WaitForSingleObjectEx (handle=0xe, timeout=500,
> alertable=0) at handles-private.h:97 #5  0x2816b116 in CreateProcess
> (appname=0xd, cmdline=0x8092ac4, process_attrs=0x0, thread_attrs=0x0,
> inherit_handles=1, create_flags=1024, new_environ=0x0, cwd=0x0,
> startup=0xbf8ec78c, process_info=0xbf8ec77c) at processes.c:427
> #6  0x2813ef4f in ves_icall_System_Diagnostics_Process_Start_internal
> (appname=0x80f89d8, cmd=0x8092ab8, dirname=0x808ff30,
> stdin_handle=0x2837e5ba, stdout_handle=0x2837e5ba,
> stderr_handle=0x2837e5ba, process_info=0xbf8ec964) at process.c:870 #7 
> 0x28f548ff in ?? ()
> #8  0x080f89d8 in ?? ()
> #9  0x08092ab8 in ?? ()
> #10 0x0808ff30 in ?? ()
> #11 0x00000009 in ?? ()
> #12 0x0000000d in ?? ()
> #13 0x0000000b in ?? ()
> #14 0xbf8ec964 in ?? ()
> #15 0x0812d420 in ?? ()
> #16 0x0812d408 in ?? ()
> #17 0x0820d300 in ?? ()
> #18 0x0808ff30 in ?? ()
> #19 0x08092ab8 in ?? ()
> #20 0x080f89d8 in ?? ()
> #21 0xbf8ec838 in ?? ()
> #22 0x28f548cc in ?? ()
> #23 0xbf8ec98c in ?? ()
> #24 0x28f542aa in ?? ()
> ---Type <return> to continue, or q <return> to quit---
> #25 0x080f89d8 in ?? ()
> #26 0x08092ab8 in ?? ()
> #27 0x0808ff30 in ?? ()
> #28 0x00000009 in ?? ()
> #29 0x0000000d in ?? ()
> #30 0x0000000b in ?? ()
> #31 0xbf8ec964 in ?? ()
> #32 0x28371bfe in mutex_unlock_common (m=0xb, add_reference=134818488)
>     at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:984
> Previous frame inner to this frame (corrupt stack?)
> (gdb) info threads
>   5 Thread 2 (LWP 100137)  0x2837bfd3 in kse_release () at kse_release.S:2
>   4 Thread 3 (sleeping)  0x28373d0f in _thr_sched_switch_unlocked
> (curthread=0x8110000) at pthread_md.h:225
> * 3 Thread 4 (LWP 100128)  0x2842c801 in __assert () from /lib/libc.so.5
>   2 Thread 1 (sleeping)  0x28373d0f in _thr_sched_switch_unlocked
> (curthread=0x8053000) at pthread_md.h:225
> (gdb) thread 3
> [Switching to thread 3 (Thread 4 (LWP 100128))]#0  0x2842c801 in __assert
> () from /lib/libc.so.5 (gdb) frame 2
> #2  0x2837010b in mutex_lock_common (curthread=0x8110e00, m=0x28482434,
> abstime=0x0) at /opt/FreeBSD/src/lib/libpthread/thread/thr_mutex.c:495
> 495                     THR_LOCK_ACQUIRE(curthread, &(*m)->m_lock);
> (gdb) print curthread->uniqueid
> $36 = 3
> (gdb) print/x curthread->magic
> $37 = 0xd09ba115
> (gdb) print/x **m
> $39 = {m_lock = {l_head = 0x7273752f, l_tail = 0x636f6c2f, l_type =
> 0x6c2f6c61, l_wait = 0x6d2f6269, l_wakeup = 0x726f6373}, m_type =
> 0x2e62696c, m_protocol = 0x7c6c6c64, m_queue = { tqh_first = 0x74737953,
> tqh_last = 0x522e6d65}, m_owner = 0x69746e75, m_flags = 0x532e656d, m_count
> = 0x61697265, m_refcount = 0x617a696c, m_prio = 0x6e6f6974, m_saved_prio =
> 0x6553492e, m_qe = {tqe_next = 0x6c616972, tqe_prev = 0x62617a69}}
>
> The thread seems to be correct, but the mutex is trashed.  It's not
> a valid mutex and the lock type (l_type) does indeed have LCK_PRIORITY
> set.  Note that libpthread doesn't create any locks of this type, so
> this trips the assertion failure.

Actually, I think we are looking at a buffer overflow in mono.  If you treat 
the mutex as a string and print it you get this:

> echo 
"7273752f636f6c2f6c2f6c616d2f6269726f63732e62696c7c6c6c6474737953522e6d6569746e75532e656d61697265617a696c6e6f69746553492e,6c61697262617a69" 
| sed -e 's/../ 0x&/g' | dh
rsu/col/l/lam/birocs.bil|lldtsySR.meitnuS.emaireazilnoiteSI.

That looks like it is in network order but is a path name.  Putting it back in 
host order gives:

/usr/local/lib/mscorlib.dll|System.Runtime.Serialization.ISe

Which is the filename and part of the class name for a .net class, so this is 
starting to look like mono overflowed some malloc'd buffer that happend to be 
next to the mutex.  Actually, the string starts with the mutex, so it looks 
more like there is a bug in mono where it is treating a pthread_mutex_t as a 
char * or some such.  Looks like it would be in mono_class_from_name() or 
something that it calls since that is the only place that string seems to 
come from. Oh, this is a known bug in mono-1.0 that has problems with freeing 
mutexes and using them afterwards.  To see what we are doing you'll have to 
get mono-1.0.2 using the stuff from the bsd sharp project.  We can look at 
this again ourselves though, since our problems may be another case of mono 
using a mutex after it has been freed.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410291656.25609.jhb>