Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Feb 2014 18:02:19 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Warner Losh <wlosh@bsdimp.com>
Cc:        "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org>
Subject:   Re: [RFC] Enable use of UserLocal Register (ULRI) if detected (patches)
Message-ID:  <alpine.BSF.2.00.1402241800120.61905@fledge.watson.org>
In-Reply-To: <D3370A85-CF9B-4857-B984-AEB74EF2B6DA@bsdimp.com>
References:  <D964DBB1-3727-4B8A-B4E3-50FD8A300818@FreeBSD.org> <092B0786-EA73-44D0-81FC-DFB56B14D4D7@bsdimp.com> <alpine.BSF.2.00.1402191759170.24900@fledge.watson.org> <D3370A85-CF9B-4857-B984-AEB74EF2B6DA@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 19 Feb 2014, Warner Losh wrote:

>> I would note, BTW, that the current use of TLS in malloc()/free() and 
>> today's MIPS exception handler for TLS implementation do introduce a very 
>> measurable overhead.  I'm left wondering if there is something we can do 
>> for unthreaded processes to avoid taking kernel traps on every memory 
>> allocation and free for MIPSes without ULRI.  (Note that that problem is 
>> present before Stacey's patch: the reason we added ULRI support is that our 
>> hardware does support ULRI, and we can therefore avoid that nasty overhead 
>> ...)  I understand there's work on a new MIPS ABI that specifies a TLS 
>> register not requiring a trap to read on non-ULRI hardware, but I'm not 
>> sure how far that is from being available.  Certainly it will require 
>> compiler/OS/etc work before it becomes useful to us.
>
> One could easily have a global, static TLS value that gets set at startup, 
> and cleared when the first thread is forked. The gettls calls then become 
> something akin to
>
> if (global_tls) return global_tls; else return _get_tls();
>
> without changes to the ABI at all...

Our measurements suggest that the overhead of instruction emulation here is a 
significant overhead due to per-malloc/free costs in userspace.  However, our 
platform is a CPU-poor compared to memory speed due to being FPGA-based 
research processor, so it might be a less significant factor on conventional 
silicon.  It might be interesting for someone developing on a more 
conventional system to do a quick but casual experiment and see if it might 
make a difference.

Robert



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1402241800120.61905>