Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Aug 2011 11:45:41 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Peter Jeremy <peterjeremy@acm.org>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: 'make -j16 universe' gives SIReset
Message-ID:  <20110817094541.GJ48988@alchemy.franken.de>
In-Reply-To: <20110816214820.GA35017@server.vk2pj.dyndns.org>
References:  <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> <20110813143807.GY48988@alchemy.franken.de> <20110816214820.GA35017@server.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 17, 2011 at 07:48:20AM +1000, Peter Jeremy wrote:
> On 2011-Aug-13 16:38:07 +0200, Marius Strobl <marius@alchemy.franken.de> wrote:
> >Could you please give the following patch with SCHED_4BSD (cpu_switch()
> >still is missing support for SCHED_ULE) with something like -j128
> >buildworlds a try on your V890?
> >http://people.freebsd.org/~marius/sparc64_replace_sched_lock_w_atomic.diff
> 
> Getting better but still not perfect.  It survived a couple of -j128
> buildworlds with another six -j16 buildworlds running in parallel.

Thanks!

> 
> But it still has the same issue pho's stress test - a thr1 process is
> blocked in urdlck.  The improvement is that there's only one stuck
> process and it took 7? hrs at INCARNATIONS=150 instead of 1-2 hours.
> (And it runs out of witness locks).
> 

Well, the sole purpose of that patch is to get rid of the MD sched_lock
usage in order to be able to add support for SCHED_ULE in a next step.
It's not obvious why this should have an impact on the problem with
userland mutex code. In fact using sched_lock provided more protection
than solving this via atomic operations, which should still be sufficient
for what we need to guarantee though. If at all I'd expect the patch to
create problems in case I've overlooked something, not to solve any :)
If it indeed has a positive impact on the the userland mutex problem then
my best guess is that this is a side-effect of the memory barriers the
patch adds to the context switching. That would indicate that the cause
of the problem in fact are missing memory barriers in the userland mutex
code, which IMO is one of the suspicious things regarding that code.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110817094541.GJ48988>