Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Aug 2005 07:22:54 +0200
From:      Christophe Yayon <lists@nbux.com>
To:        freebsd-hackers@freebsd.org
Cc:        deischen@freebsd.org
Subject:   Re: nagios and freebsd threads issue : help please ...
Message-ID:  <430C042E.70009@nbux.com>

next in thread | raw e-mail | index | archive | help
Hi all,

Here is my copy/paste from freebsd-hackers to nagios-devel list, and the 
answer from a Nagios developper.


Christophe Yayon wrote:

 >> Hi again,
 >>
 >> After some discussions on freebsd-hackers mailling list, here is a 
resume :
 >>
 >> 1. There a recommendation (or a suggestion) for what to do after a 
fork() :
 >> 
http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html
 >> "In other words "It is suggested that programs that use fork() call an
 >> exec function very soon afterwards in the child process, thus resetting
 >> all states. In the meantime, only a short list of async-signal-safe
 >> library routines are promised to be available."
 >> Note *suggested*. This is a recommendation to protect against a shoddy
 >> pthread-implementation. The thread specifications rule that only the
 >> thread calling fork() is duplicated, which initially leads to the
 >> recommendation (other threads holding locks aren't around to release 
them
 >> in the new execution context).
 >>
 >>
 >> 2. it appears that Nagios do after a fork :
 >> in base/util.c:
 >>         (1) Become the process group leader by calling setpgid(0, 0);
 >>         (2) something called set_all_macro_environemt_vars(TRUE).
 >>             This calls snprintf a bunch, as well as set variables
 >>             by saving them to malloced memory.  This save is done
 >>             with strcpy and strcat.  setenv is then called to try to
 >>             export them.  memory is then freed with free(3).
 >>         (3) All signal handlers are reset
 >>         (4) The right part of the pipe is closed
 >>         (5) sigalarm handler is created and an alarm set.
 >>         (6) Checks to see if it executing an embedded perl script,
 >>             then tries to execute it if so.  This has the feel of
 >>             being too much after the fork.
 >>         (7) Calls popen on the command if not.
 >>         (8) Reads the output of the command using fgets.
 >>         (9) closes the other end of the pipe
 >>         (10) unsets all env vars.
 >>         (11) Calls _exit()
 >>
 >> in base/checks.c
 >>         (1) set_all_macro_environment_vars(TRUE)
 >>         (2) forks again
 >>         (3) granchild:
 >>                 resets handler, setpgid, etc.
 >>                 if perl script, do embedded perl, otherwise popen.
 >>                 lots of read/write to pipe.
 >>
 >> likewise in base/commands.c fork is also called for similar things.
 >> There's other places that also call popen...
 >>
 >>
 >> 3. You can only execute async-signal-safe functions after a fork()
 >> from a threaded application.  free(), malloc(), popen(), fgets(),
 >> are not async-signal-safe.


In a proper implementation they are. Read malloc/malloc.c from
glibc-2.3.5 and you'll see. The first line of it reads

"/* Malloc implementation for multiple threads without lock contention"

fgets() must also be async-safe, since it's passed its storage-buffer
from the calling function. It can contain races if several threads (or
programs for that matter) tries to read FIFO's at the same time or are
trying to store things to the same piece of memory, but that's neither
new, strange or in any way non-obvious. Obviously, fgets() relies on
lower-level IO code which must be thread-safe (read() in this case) on
account of them being syscalls inside multitasking kernels.

popen() forks and calls execve immediately. If this isn't thread-safe
then there's no way of executing external programs in multithreaded
applications short of implementing popen() directly (which isn't exactly
difficult, but still).


 >>  The list of async-signal-safe functions
 >> are here: http://www.opengroup.org/onlinepubs/009695399/nframe.html
 >> The restriction on fork() is here (20th bullet down):
 >> http://www.opengroup.org/onlinepubs/009695399/nframe.html
 >>


Both of those links point to the same document, which is just the
frameset for the navigation-frames.

For async-safe functions, this is the proper url;
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html#tag_02_09_01

For the fork() specification, the doc is here;
http://www.opengroup.org/onlinepubs/009695399/functions/fork.html

The 20'th bullet is this;
-----------
"A process shall be created with a single thread. If a multi-threaded
process calls fork(), the new process shall contain a replica of the
calling thread and its entire address space, possibly including the
states of mutexes and other resources. Consequently, to avoid errors,
the child process may only execute async-signal-safe operations until
such time as one of the exec functions is called. [THR] [Option Start]
Fork handlers may be established by means of the pthread_atfork()
function in order to maintain application invariants across fork()
calls. [Option End]

When the application calls fork() from a signal handler and any of the
fork handlers registered by pthread_atfork() calls a function that is
not asynch-signal-safe, the behavior is undefined."
-----------

Also note that "From the application's perspective, a fork() call should
appear atomic." which implicitly states fork() as an async-safe function
although the following execution may not be. It also warns that improper
implementations makes it less so.



 >>
 >> 4. Some FreeBSD developpers think that handling fork() in libpthread 
(and
 >> probably libthr) than was done in libc_r.  We thought it better not 
to try
 >> and reinitialize libpthread (and to some extent libc) because
 >> it is messy and to expose non-portable applications.
 >>


This is funny, because nagios apparently runs properly on Linux, HPUX,
Solaris, Irix, AIX and Tru64. To me that seems to indicate that Nagios
is very portable indeed and that the BSD fellows somehow botched it. I
might be wrong, but...


 >>
 >>
 >> Possibles solutions :
 >>
 >> a. (the best, i think) Trying to modify Nagios code to respect the
 >> recommendation (1.). We are talking about portability and not
 >> performance...
 >>


This would involve a fairly large change in the way things are done. I
for one am all for implementing a different parallelisation mechanism
but I'm fairly certain Ethan won't be too thrilled if I rewrite 40% of
the code that's currently the Nagios core.


 >> b. a possible workaround for Nagios FreeBSD (and i think other Unix
 >> systems, except Linux) is to use another threads library. For FreeBSD it
 >> seems that uising GNU/pth (which is in the ports) seems to completely
 >> resolve the problem (but i think it's ugly to have to use another -not
 >> native- threads lib...).
 >>
 >>
 >>
 >> What do you think about this ?



In summary; Some thread-libraries work while others don't (the native
*BSD one being the only one that doesn't), I'd say it's time to fix that
thread-library, although I favor the rewrite-nagios approach as an
exercise in intellectual masturbation and would be quite willing to do
the actual work of it, provided I can be somewhat sure it isn't wasted.


 >> Sorry for my english (i am french...)
 >>


Your english is far better than most native english speakers I've come
across.


 >>
 >> PS : thanks to all freebsd-hackers posters which permit to resume the
 >> problem (Warner Losh, Daniel Eischen, Alexey Vesnin).




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?430C042E.70009>