Date: Mon, 24 Feb 2014 18:02:19 +0000 (GMT) From: Robert Watson <rwatson@FreeBSD.org> To: Warner Losh <wlosh@bsdimp.com> Cc: "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org> Subject: Re: [RFC] Enable use of UserLocal Register (ULRI) if detected (patches) Message-ID: <alpine.BSF.2.00.1402241800120.61905@fledge.watson.org> In-Reply-To: <D3370A85-CF9B-4857-B984-AEB74EF2B6DA@bsdimp.com> References: <D964DBB1-3727-4B8A-B4E3-50FD8A300818@FreeBSD.org> <092B0786-EA73-44D0-81FC-DFB56B14D4D7@bsdimp.com> <alpine.BSF.2.00.1402191759170.24900@fledge.watson.org> <D3370A85-CF9B-4857-B984-AEB74EF2B6DA@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 19 Feb 2014, Warner Losh wrote: >> I would note, BTW, that the current use of TLS in malloc()/free() and >> today's MIPS exception handler for TLS implementation do introduce a very >> measurable overhead. I'm left wondering if there is something we can do >> for unthreaded processes to avoid taking kernel traps on every memory >> allocation and free for MIPSes without ULRI. (Note that that problem is >> present before Stacey's patch: the reason we added ULRI support is that our >> hardware does support ULRI, and we can therefore avoid that nasty overhead >> ...) I understand there's work on a new MIPS ABI that specifies a TLS >> register not requiring a trap to read on non-ULRI hardware, but I'm not >> sure how far that is from being available. Certainly it will require >> compiler/OS/etc work before it becomes useful to us. > > One could easily have a global, static TLS value that gets set at startup, > and cleared when the first thread is forked. The gettls calls then become > something akin to > > if (global_tls) return global_tls; else return _get_tls(); > > without changes to the ABI at all... Our measurements suggest that the overhead of instruction emulation here is a significant overhead due to per-malloc/free costs in userspace. However, our platform is a CPU-poor compared to memory speed due to being FPGA-based research processor, so it might be a less significant factor on conventional silicon. It might be interesting for someone developing on a more conventional system to do a quick but casual experiment and see if it might make a difference. Robert
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1402241800120.61905>