Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Jun 2005 07:43:06 -0700
From:      Luigi Rizzo <rizzo@icir.org>
To:        Charles Sprickman <spork@fasttrackmonkey.com>
Cc:        hackers@freebsd.org
Subject:   Re: Nagios and threads
Message-ID:  <20050622074306.C92493@xorpc.icir.org>
In-Reply-To: <Pine.OSX.4.61.0506201654400.374@gee5.nat.fasttrackmonkey.com>; from spork@fasttrackmonkey.com on Mon, Jun 20, 2005 at 04:56:36PM -0400
References:  <Pine.OSX.4.61.0506201654400.374@gee5.nat.fasttrackmonkey.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--EVF5PPMfhYS0aIcm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

reading also the continuation of this mail thread, I wonder if there
is any relationship with this issue i found a few days ago debugging
asterisk. It happens when linking the code with libc_r, but maybe
some of the bugs in libc_r were also imported in other thread
libraries.

	cheers
	luigi

--------------------
Probably a known issue, but I thought it worthwhile reporting it,
if nothing else for archival purposes.

I think our userland thread library (libc_r) has some bugs in
handling descriptors.  I can reproduce the behaviour on -current
and 4.x, and I believe it applies to 5.x too.  

Following is a description of the problem and some code to replicate it
The code includes a workaround but it is not particularly nice.

Any better ideas ? I am not sure on what to do, but perhaps the
only sensible thing to do is to add a note with this workaround
(or better ones, if available) to our pthreads manpage

--- PROBLEM DESCRIPTION ---

Basically, our libc_r keeps two views of i/o descriptors, one
(external) is for threads and reflects the modes requested by the
threads (blocking or not, etc.); the "internal" view instead is how
descriptors are actually set in the kernel -- and there they should
always be set as O_NONBLOCK to avoid blocking on a syscall.

The bug occurs when a process does a fork(), and then either
a close() or an exec() -- a similar thing also occurs with popen().
The relevant source code is in

    /usr/src/lib/libc_r/uthread/uthread_execve.c
    /usr/src/lib/libc_r/uthread/uthread_close.c

Right before the exec(), the internal descriptors are put into
blocking mode if the external one are blocking, and they are only
reset to O_NONBLOCK after termination of the child (upon SIGCHLD).
The same occurs for close(). 

Note that close() has hacks to leave pipes alone, but the same
code is not present in the execve() case where instead I believe
it would be necessary. Another thing to note is that there is
some kind of 'fate sharing' among the stdio descriptors (0, 1, 2)
which is not totally clear to me, but seems to require setting
O_NONBLOCK on all 3 to make sure that they are not changed to
blocking mode.

Because descriptors are shared between parent and child, for the
lifetime of the child descriptors in the parent will be blocking
and the scheduling of threads will be completely broken.

The only fix i have found is to act as follows:

        pipe(fd);       /* create a pipe with the child */
        p = fork();
        if (p == 0) { /* child */
            /* call fcntl() _before_ close() to avoid resetting
             * O_NONBLOCK on the internal descriptors. After that,
             * close the descriptors not needed in the child.
             */  
            for (i=0; i < getdtablesize(); i++) {
                long fl = fcntl(i, F_GETFL);
                if (fl != -1 && i != fd[0]) {
                    /* open and must be closed in the child */
                    fcntl(i, F_SETFL, O_NONBLOCK | fl);
                    close(i);
                }
            }
            /* standard stuff (dup2, exec*()... */
            dup2(fd[0], STDOUT_FILENO); /* as an example */
            execl(....);
        } else { /* parent */
            close(fd[0]);       /* close child end. */
            ...
        }

but of course this is rather unintuitive. On the other hand,
I have no idea of a better way to address the problem, and being
fairly new to threads programming maybe others know better.

I am attaching two minimal programs to demonstrate the bug.

simple.c is a simple program (linked against the regular C library)
	cc -o simple simple.c

that only plays with blocking mode on the descriptors.

thre.c is meant to be linked with libc_r.
	cc -o thre thre.c -lc_r

It does a fork and exec of the other program.
If you call it without arguments, it does not implement the
above workaround, and you see how the 'internal' descriptor
change to blocking mode. If you call it with an argument, it
implements the workaround.

	enjoy
	luigi

On Mon, Jun 20, 2005 at 04:56:36PM -0400, Charles Sprickman wrote:
> Hello,
> 
> Just curious if there's any regulars here who would like to help Ethan 
> out:
> 
> http://nagios.sourceforge.net/docs/2_0/whatsnew.html
> 
> "Known Issues
> 
> There are a few known issues with the Nagios 2.0 code at the moment. 
> Hopefully some of these will be fixed before 2.0 is released as stable...
> 
> 1. FreeBSD and threads. On FreeBSD there's a native user-level 
> implementation of threads called 'pthread' and there's also an optional 
> ports collection 'linuxthreads' that uses kernel hooks. Some folks from 
> Yahoo! have reported that using the pthread library causes Nagios to pause 
> under heavy I/O load, causing some service check results to be lost. 
> Switching to linuxthreads seems to help this problem, but not fix it. The 
> lock happens in liblthread's __pthread_acquire() - it can't ever acquire 
> the spinlock. It happens when the main thread forks to execute an active 
> check. On the second fork to create the grandchild, the grandchild is 
> created by fork, but never returns from liblthread's fork wrapper, because 
> it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out 
> with this problem."
> 
> Thanks,
> 
> Charles
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

--EVF5PPMfhYS0aIcm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="thre.c"

/*
 * test descriptor issues on threads.
 *
 * compile with cc -o thre -lc_r thre.c
 */

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>

int dump_desc(char *s, int w)
{
	int i;
        fprintf(stderr, "-- [pid %d thr %p] %s --\n", getpid(),
                pthread_self(), s);
	for (i=0; i<8; i++) {
		fprintf(stderr, "fd %d flags 0x%lx (system 0x%lx)\n", i,
			_thread_fd_getflags(i),
			__sys_fcntl(i, F_GETFL));
	}
	sleep(w);
        return 0;
}

int
main(int argc, char *argv[])
{
	pid_t p;
	int i, fd[2];

	pipe(fd);
	fprintf(stderr, "child-end %d    parent end %d max %d\n",
		fd[0], fd[1], getdtablesize());
	dump_desc("start main", 0);
	p = fork();
	if (p == 0) { /* child */
		/*
		 * close parent's end. It's a pipe so O_NONBLOCK remains.
		 * You can also do it in the loop below.
		 */
		close(fd[1]);
		/*
		 * First tell libc_r to leave O_NONBLOCK on the descriptors
		 * even after a close() or exec(), 
		 * _After_ that, close() all descriptors you don't need
		 * in the child, because they are shared and the child
		 * could change their mode in unexpected way causing us
		 * trouble.
		 * You can limit the loop (getdtablesize() is often large)
		 * but at least make sure to act on the descriptor you are
		 * using on the parent threads in blocking mode.
		 */ 
		if (argc > 1)
		    for (i=0; i < getdtablesize(); i++) {
			long fl = fcntl(i, F_GETFL);
			if (fl != -1 && i != fd[0]) {
				/* open and must be closed in the child */
				fcntl(i, F_SETFL, O_NONBLOCK | fl);
				close(i);
			}
		    }
		dup2(fd[0], STDOUT_FILENO);
		sleep(2);
		/*
		 * now we can finally exec a process without risking
		 * trouble. The process will only play with its own
		 * side of the pipes, which is not shared by the parent
		 * and so any action on it does not change the status
		 * on the parent side.
		 * The example process below does some weird things
		 * with the descriptors, and we use it to show that it
		 * does not harm us.
		 */
		execl("./simple", "simple", "2", NULL);
	} else {	/* parent */
		close(fd[0]);	/* close child end of the pipe */
		sleep(1);
		dump_desc("parent", 2);
		dump_desc("parent after exec done", 2);
		dump_desc("parent after child fcntl", 2);
		dump_desc("parent after child dead", 0);
	}
	return 0;
}

--EVF5PPMfhYS0aIcm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="simple.c"

/*
 * test descriptor issues on threads.
 *
 * compile with cc -o simple simple.c
 */

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int
main(int argc, char *argv[])
{
	pid_t p;
	int fd[2];
	FILE *f;

	pipe(fd);
	sleep(atoi(argv[1]));
	dup2(fd[0], STDOUT_FILENO);
	fcntl(0, F_SETFL, ~O_NONBLOCK & fcntl(0, F_GETFL));
	fcntl(1, F_SETFL, ~O_NONBLOCK & fcntl(1, F_GETFL));
	fcntl(2, F_SETFL, ~O_NONBLOCK & fcntl(2, F_GETFL));
	sleep(atoi(argv[1]));
	return 0;
}

--EVF5PPMfhYS0aIcm--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050622074306.C92493>