Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Nov 2003 09:35:21 +0800
From:      David Xu <davidxu@viatech.com.cn>
To:        deischen@freebsd.org
Cc:        Marcel Moolenaar <marcel@xcllnt.net>
Subject:   Re: KSE/ia64 broken
Message-ID:  <3FB825D9.6050407@viatech.com.cn>
In-Reply-To: <Pine.GSO.4.10.10311161951020.11563-100000@pcnet5.pcnet.com>
References:  <Pine.GSO.4.10.10311161951020.11563-100000@pcnet5.pcnet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Daniel Eischen wrote:

>On Sun, 16 Nov 2003, Marcel Moolenaar wrote:
>
>  
>
>>On Sun, Nov 16, 2003 at 04:55:44PM -0500, Daniel Eischen wrote:
>>    
>>
>>>On Sun, 16 Nov 2003, Marcel Moolenaar wrote:
>>>
>>>      
>>>
>>>>>The same thread (main thread) is being resumed over and over again
>>>>>which shouldn't happen for this simple program.
>>>>>          
>>>>>
>>>>Can it be that the thread is deadlocked? There's no forward progress.
>>>>There's only context switching...
>>>>        
>>>>
>>>I don't think so.  I think the thread stack/frame is corrupted, either
>>>because it is copied out or resumed incorrectly.  I'll do some more
>>>digging.
>>>      
>>>
>>I loaded it up in the simulator. The thread is continuously being
>>resumed because of a page fault that results in an upcall, which
>>ends up in the UTS, which selects the same thread, which causes the
>>page fault again.
>>    
>>
>
>Is it possible the thread is marked for an upcall when the
>page is not yet present?]
>
Current, on IA64, page fault never schedules an upcall, I have only 
enabled it on i386,
and peter enabled it on AMD64.

>
>  
>
>>The page fault is the result of a bogus address
>>that in the debugger results in a SIGILL. However, when we don't
>>run in a debugger, the SIGILL doesn't get handled. Hence the non-
>>forward progress.
>>
>>The extensive debug information I posted earlier is therefore still
>>relevant. Now that I have things running in the simulator I'll see
>>if I can figure out where things go wrong. Chances are that we now
>>have an upcall where we didn't have one before and that it exposes
>>incomplete state (such as a thread pointer that hasn't been set).
>>The incomplete state causes the corruption we're seeing.
>>    
>>
>
>This is kind of what I was thinking too.
>  
>
The returned memory block from malloc() is being used by unknown code, I 
don't know
why it occurs, but if you waste a memory block by applying the following 
patch for
thr_alloc(), then things work:

Index: thr_kern.c
===================================================================
RCS file: /home/ncvs/src/lib/libpthread/thread/thr_kern.c,v
retrieving revision 1.102
diff -u -r1.102 thr_kern.c
--- thr_kern.c  9 Nov 2003 00:37:14 -0000       1.102
+++ thr_kern.c  17 Nov 2003 01:24:59 -0000
@@ -2422,6 +2422,8 @@
        struct pthread  *thread = NULL;
        int i;
 
+       malloc(sizeof(struct pthread));
+
        if (curthread != NULL) {
                if (GC_NEEDED())
                        _thr_gc(curthread);




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FB825D9.6050407>