Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jun 2003 10:06:27 -0500
From:      "Jacques A. Vidrine" <nectar@FreeBSD.org>
To:        Mike Makonnen <mtm@identd.net>
Cc:        kris@obsecurity.org
Subject:   Re: making nsswitch(3) thread-safe (was Re: Removal of bogus gethostbyaddr_r())
Message-ID:  <20030623150627.GD82167@madman.celabo.org>
In-Reply-To: <20030619055933.FNBD4805.out003.verizon.net@kokeb.ambesa.net>
References:  <20030618210812.GB21622@rot13.obsecurity.org> <Pine.GSO.4.10.10306182206300.3647-100000@pcnet5.pcnet.com> <20030619055933.FNBD4805.out003.verizon.net@kokeb.ambesa.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[Sorry for the late reply.  As you may or may not have noticed :-)
 my FreeBSD hiatus continues :-(  I'll be back, just not sure when
 yet.]

On Thu, Jun 19, 2003 at 01:59:32AM -0400, Mike Makonnen wrote:
> 
> [ Jacques, I'm CC'ing you since this affects nss ]

Thanks, Mike! -- for CC'ing me and for looking at this in the first
place.

> I just took a look into what it would take to make the gethostby*_r functions
> thread safe, and it isn't pretty. The "thread-unsafeness" goes all the way
> down into the individual methods for the various nsswitch(3) databases (dns,
> files, etc). So, the most expedient and least invasive way to go about it would
> be to put a mutex in the gethostby* functions that is dependent on __isthreaded
> being true. This, offcourse, means that the entire call-chain (from top to
> bottom) is locked for the duration of a call to one of these functions, which
> can be a considerable amount of time if it has to timeout. During this time no
> other thread in the application can resolve a host because it will be blocked on
> said mutex.

You couldn't take this approach even if you wanted.  No part of libc
should hold a lock while making calls through nsdispatch.  NSS modules
are free to call back into libc.  Imagine holding a lock in
gethostbyname_r, calling through nsdispatch and getting to, say, the
LDAP backend.  The LDAP backend might itself invoke gethostbyname_r
or some other nsdispatch consumer resulting in recursive locking or
lock order reversals.
 
> To my mind the way to fix this properly is to make the nsdispatch(3)
> call-chain in libc thread-safe from the ground-up. This is a huge task in and of
> itself, but is complicated even further by the fact that whatever we do can't
> change the semantics of the existing user-visible functions.

Yes, this is the approach that must be taken.  The nsdispatch engine
is itself thread-safe.  The getpw*_r, getgr*_r functions are
thread-safe.  I believe that with the exception of the `dns' backend,
the existing backends are thread-safe.

> I think there are three points to keep in mind:
> 
> 1. The nsswitch(3) database functions should be made thread-safe, but at the
>     same time not change the thread-unsafe semantics that third-party apps
>     might expect (through nsdispatch(3)).
> 2. The thread-unsafeness starts at the bottom (low-level functions) of the
>     call-chain, in the nsswitch(3) database functions.
> 3. Because so many databases (hosts, passwd, group, etc) will be affected by
>     this it would be too huge a task to do it all at once. The safest approach
>     is probably to introduce the changes separately a few at a time.

What do you mean by the `nsswitch(3) database functions' ?  Do you
mean the backends?  If so, yes, the backends must ultimately be
thread-safe, and it is best to manage them one database at a time
i.e., all built-in backends --- files, nis, dns --- for a database
must be done at once.

> We can achieve this by
> Introducing a variable that gets passed down the call-chain telling the
> nsswitch(3) database functions whether the "destination address" (return
> value) will by allocated by the caller or not. If this variable is false then
> the routines can use a static buffer, otherwise, they will assume the caller has
> allocated the necessary space.
>
> Most of the functions will need a lot more work than this to make them
> fully thread-safe. The advantage of doing it this way is that it maintains
> the status-quo while allowing us to make the individual subsystems
> thread-safe separately and as time and resources permit.
> 
> The following patch shows what I have in mind. It won't
> compile just yet, but it might make what I'm saying a little clearer :-)
> This initial phase is really just a mechanical change of argument lists. It
> doesn't introduce any thread-safeness on its own. It just makes it easier to
> introduce such changes safely on a subsytem by subsystem basis.
> 
> Basically, the _nsdispatch() functionality is moved into nsdispatch_common(),
> which takes a va_list as its last argument instead of a variable number of
> arguments. It also takes one additional argument(int is_r) so callers can tell
> it what kind of semantics they want. Applications calling nsdispatch() get a
> wrapper that passes a false (0) value in the is_r argument. Callers from within
> libc will continue calling_nsdispatch(), which also grows an additional argument
> (int is_r) that callers can pass down to the nsswitch(3) database functions
> through nsdispatch_common().
> 
> What do you think? Should I continue with this or do you think it's bogus?
>
> The important point to keep in mind is that we have to keep the dual semantics
> (thread-safe and unsafe) all the way down the call-chain because only the
> top-level and low-level functions know the actual memory type and size that
> needs to be allocated for the return value to the library users. So, we can't
> make the low-level nsswitch(2) database functions unconditionally thread-safe
> and restrict the dual semantics to the top-level functions.

I think this is bogus :-)  I must say I don't really understand what
you are trying to accomplish ... maybe we should get on IRC and chat.
Also, please look at getpwent.c to see how thread-safe versus
non-thread-safe entry points to getpw* are handled if you have not
already.  I can't see any need to add this `is_r' argument.  I don't
see the purpose of it.

The hardest part of making the thread-safe gethost*_r functions, IMHO
(after having done some work in this area) is untangling the
KAME-contributed code.  There is a lot of incest between the
implementations of the various high-level resolver routines
(gethostbyname, getaddrinfo, getipnodebyname, etc.) that complicates
things.   But the approach is conceptually simple:

   1. Write gethostbyname_r (thread-safe), which invokes
      nsdispatch("gethostbyname_r", ...).

   2. Make each back-end thread-safe.  Starting with a global lock is
      reasonable.  I don't think we can ever do anything efficient
      with the BIND 4 resolver, anyway.

Only I think you want to do the whole family of functions in one blow,
not one-at-a-time.  (But maybe I'm wrong here, 'cause that has
certainly kept me from committing any partial work in this area.)

Note also that the libc<-->NSS modules interface is constrained.
Nominally we use the GNU libc interface, which has different entry
points for e.g. gethostbyname and gethostbyname_r, with a set number
and type of parameters.  We don't want to change these interface lest
we make porting NSS modules to FreeBSD harder.  We can add some
interfaces, though, where needed (i.e. getaddrinfo).

You can contact me in a number of ways to discuss:

  - EFnet nickname `nectar' (or whatever nick shows up for 
    `nectar@free.bsd')
  - AIM `nectar5'
  - ICQ `17783107'
  - Yahoo! `nectar'
  - MSN `nectar523@hotmail.com'
  - Voice 612-743-3364

It would be best if we could `schedule' something.  I apologize, but
my freetime is very slim lately.

Cheers,
-- 
Jacques Vidrine   . NTT/Verio SME      . FreeBSD UNIX       . Heimdal
nectar@celabo.org . jvidrine@verio.net . nectar@freebsd.org . nectar@kth.se



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030623150627.GD82167>