From owner-freebsd-threads@FreeBSD.ORG Fri Nov 21 08:26:17 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A8D4A16A4CE; Fri, 21 Nov 2003 08:26:17 -0800 (PST) Received: from ns1.xcllnt.net (209-128-86-226.bayarea.net [209.128.86.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4DD4443FB1; Fri, 21 Nov 2003 08:26:16 -0800 (PST) (envelope-from marcel@xcllnt.net) Received: from dhcp01.pn.xcllnt.net (dhcp01.pn.xcllnt.net [192.168.4.201]) by ns1.xcllnt.net (8.12.9/8.12.9) with ESMTP id hALGQFEG006911; Fri, 21 Nov 2003 08:26:15 -0800 (PST) (envelope-from marcel@piii.pn.xcllnt.net) Received: from dhcp01.pn.xcllnt.net (localhost [127.0.0.1]) hALGQFdL003374; Fri, 21 Nov 2003 08:26:15 -0800 (PST) (envelope-from marcel@dhcp01.pn.xcllnt.net) Received: (from marcel@localhost) by dhcp01.pn.xcllnt.net (8.12.10/8.12.10/Submit) id hALGQ9cv003373; Fri, 21 Nov 2003 08:26:09 -0800 (PST) (envelope-from marcel) Date: Fri, 21 Nov 2003 08:26:09 -0800 From: Marcel Moolenaar To: David Xu Message-ID: <20031121162609.GA3258@dhcp01.pn.xcllnt.net> References: <20031117014620.GB61716@dhcp01.pn.xcllnt.net> <20031121101356.GA92329@athlon.pn.xcllnt.net> <3FBE061B.3070206@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3FBE061B.3070206@freebsd.org> User-Agent: Mutt/1.5.4i cc: David Xu cc: threads@freebsd.org Subject: Re: KSE/ia64 broken X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2003 16:26:17 -0000 On Fri, Nov 21, 2003 at 08:33:31PM +0800, David Xu wrote: > > > >Ok. More pieces of the puzzle. If I apply the attached patch (against > >clean sources), I get the following: > > > >itanium% ./foo.bad > >XXX:_thr_alloc: thread=200000000008a000, tcb=2000000000085000 > >XXX:_thr_alloc: thread=2000000000090000, tcb=2000000000090000 > > > >The second _thr_alloc() is screwed up, in that malloc() returns > >the same pointer twice. Hence thread->tcb points to thread itself > >and we're clobbering our thread structure. > > > I saw the same result. > > >Since thr_spinlock.c > >affects the locking of malloc(), we may have a race condition. > >Note that forcing an upcall (by adding a _thread_printf() in the > >code stream) seems to fix it. Does the UTS call malloc when first > >invoked? > > > No, we never call malloc in such case. I suspect we do not > fully restore thread's context. In kernel, I pass zero as third > parameter to get_mcontext(), is it enough for ia64 ? Yes. The context is asynchronous. We save and restore all scratch registers, including the high FP registers. Note that an incorrect context restoration would very likely not have such a clean failure mode. The thing that bugs me is that if you add a _thread_printf() just prior to the call to _thr_alloc(), you trigger an upcall. That seems to make all the difference. It's like having to avoid that the UTS gets its first upcall with a spinlock held. What also bugs me is that the second malloc happily returns the same address as the malloc immediately prior to it. There's no indication of corruption. It's like the first malloc never happened or that the memory got freed in between. If you look at it from a more context oriented point of view; it's like the second malloc is returning the results of the first malloc as if the context of the first (assuming it got saved) is restored by the second. This could mean that if the context switching is normal, that we missed saving a context and we're restoring a stale context. Anyway: upcalls play a key role. BTW: Maybe an interesting experiment is to disable upcalls on page faults on i386 and see if that makes a difference. We do not have upcalls for page faults on ia64. There may be an upcall on i386 that we do not get on ia64... -- Marcel Moolenaar USPA: A-39004 marcel@xcllnt.net