Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Jun 2003 16:34:44 -0700 (PDT)
From:      Julian Elischer <julian@elischer.org>
To:        Gareth Hughes <gareth@nvidia.com>
Cc:        Andy Ritger <ARitger@nvidia.com>
Subject:   RE: NVIDIA and TLS
Message-ID:  <Pine.BSF.4.21.0306161609550.19977-100000@InterJet.elischer.org>
In-Reply-To: <2D32959E172B8F4D9B02F68266BE421401A6D7DB@mail-sc-3.nvidia.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On Mon, 16 Jun 2003, Gareth Hughes wrote:

> On Mon, 16 Jun 2003, Daniel Eischen wrote:
> > 
> > Again, %gs isn't per-thread; it's per-KSE.  Plus, we're reserving
> > TLS for one vendor/library.  What happens when someone else comes
> > along and wants the same thing?  I'd much rather see someone push
> > for a new OpenGL spec with better interfaces/APIs.
> 

I think that the problem is that the access method for TLS is dependent
on which library is used.

In the multiplexd thread library, %gs points to the current KSE (kernel
Schedulable entity) (think virtual CPU), and THAT has a pointer to the
current thread. with thousands of threads going in and out of
runnablility each tick (without notifying the kernel) we don't want to
keep changing %gs in userland as that's slow. (there are plenty of apps
that have MANY threads). These threads run entirely in userland and
control switches between them at an alarming rate.. Anything that slows
down context switches has a bad effect on the speed that these programs
(e.g. some java implementations and programs) run.

In the 1:1 (or N:N) thread library the context times are considerably
larger as there is kernel interaction on each and every context switch.
Overhead from switching %gs in this library would probably be buried in
the noise. It would probably be an acceptable solution.

In the single-streamed pthreads library (libc_r) that is currently in
use, %gs is no used so your code is not colliding, but then
there isn't explicit support for __thread though (curthread->TLS)
would be all that is required since it is not multithreaded from a real
perspective.

The trouble is that each of these would require a differnt mechanism to
reach TLS and the compiler cannot know ahead of time which one to use.

> I don't think there's a library out there that has the strict
> performance requirements that OpenGL does.  Of course, if FreeBSD
> supported the ELF TLS standard.

I may be wrong but I don't think it is a standard yet..
especailly for the reason that we see here..
It requires that the compiler know what threading library is in use.

We could certainly implement efficient TLS code generation for each
library, but which one would be compiled in when you compile a .o file
that may be used with any library?

> this point would be moot because
> applications and libraries would automatically get fast
> thread-local storage.  If not, and another library really did need
> the same kind of fast TLS access, what's wrong with just allocating
> another static block after the libGL one?  Your internal data
> structures would work fine, libGL would work fine because you
> haven't changed the location of its data block, and the new library
> would access its data directly.  The only problem with this scheme
> is if you move the block, or change the way it is accessed, this
> would break binary compatibility.

A single library is not going to get It's own block in the 
system thread descriptor, but it makes great sense to allocate a pointer
there for the system to support TLS in a uniform manner for many
applications.

if we place a pointer (actually a couple) there 
for the .tbss and .tdata segments then we can get to those segments
quickly (though possibly requiring 2 instructions) (I'm not a 
x86 assembler guru)

I wouldn't think of adding such for a single app or library, but support
for ELF TLS is probably of high enough importance
that it is worth while..

as I said though.. "which library gets to support it?" the one with one
indirection? the one with 2 indirections or the one with no
indirections?

The trick is that because it is a pre-link-time thing, it cannot handle 
the case where there are more than one binary interface to the threads..
That is why it is not in the posix threading moddle and probably never
really will be, except in a slow form.

(BTW have you looked at the speed of function calls on modern
PCs, I think you'll be surprised).





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0306161609550.19977-100000>