Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Aug 2011 17:27:25 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        Peter Jeremy <peterjeremy@acm.org>
Cc:        freebsd-sparc64@freebsd.org
Subject:   Re: 'make -j16 universe' gives SIReset
Message-ID:  <20110830152725.GA28552@alchemy.franken.de>
In-Reply-To: <20110817094541.GJ48988@alchemy.franken.de>
References:  <20110526234728.GA69750@server.vk2pj.dyndns.org> <20110527120659.GA78000@alchemy.franken.de> <20110601231237.GA5267@server.vk2pj.dyndns.org> <20110608224801.GB35494@alchemy.franken.de> <20110613235144.GA12470@server.vk2pj.dyndns.org> <20110813143807.GY48988@alchemy.franken.de> <20110816214820.GA35017@server.vk2pj.dyndns.org> <20110817094541.GJ48988@alchemy.franken.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 17, 2011 at 11:45:41AM +0200, Marius Strobl wrote:
> On Wed, Aug 17, 2011 at 07:48:20AM +1000, Peter Jeremy wrote:
> > On 2011-Aug-13 16:38:07 +0200, Marius Strobl <marius@alchemy.franken.de> wrote:
> > >Could you please give the following patch with SCHED_4BSD (cpu_switch()
> > >still is missing support for SCHED_ULE) with something like -j128
> > >buildworlds a try on your V890?
> > >http://people.freebsd.org/~marius/sparc64_replace_sched_lock_w_atomic.diff
> > 
> > Getting better but still not perfect.  It survived a couple of -j128
> > buildworlds with another six -j16 buildworlds running in parallel.
> 
> Thanks!
> 
> > 
> > But it still has the same issue pho's stress test - a thr1 process is
> > blocked in urdlck.  The improvement is that there's only one stuck
> > process and it took 7? hrs at INCARNATIONS=150 instead of 1-2 hours.
> > (And it runs out of witness locks).
> > 
> 
> Well, the sole purpose of that patch is to get rid of the MD sched_lock
> usage in order to be able to add support for SCHED_ULE in a next step.
> It's not obvious why this should have an impact on the problem with
> userland mutex code. In fact using sched_lock provided more protection
> than solving this via atomic operations, which should still be sufficient
> for what we need to guarantee though. If at all I'd expect the patch to
> create problems in case I've overlooked something, not to solve any :)
> If it indeed has a positive impact on the the userland mutex problem then
> my best guess is that this is a side-effect of the memory barriers the
> patch adds to the context switching. That would indicate that the cause
> of the problem in fact are missing memory barriers in the userland mutex
> code, which IMO is one of the suspicious things regarding that code.

Looking into the implementation of atomic operations reveals that with
the memory model used for running both the kernel and all of the userland
we currently actually include redundant memory barriers in some cases.
Could you please re-fetch the patch from the above URL and test it again
(hopefully for the last time) with buildworlds? It works just fine here
but there again could be issues that are more likely to be triggered with
more CPUs.
Regarding the problem with the userland mutex code could you please try
whether the following patch makes a difference? Given that the previous
version of the above one as a side-effect made that problem harder to
trigger it's probably a good idea to test the second patch separately.
http://people.freebsd.org/~marius/sparc64_casuword_membar.diff

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110830152725.GA28552>