Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Jun 2005 02:54:47 -0700
From:      Luigi Rizzo <rizzo@icir.org>
To:        current@freebsd.org
Subject:   bug or feature in userland thread library (O_NONBLOCK)
Message-ID:  <20050615025447.A62971@xorpc.icir.org>

next in thread | raw e-mail | index | archive | help

--+QahgC5+KEYLbs62
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Probably a known issue, but I thought it worthwhile reporting it,
if nothing else for archival purposes.

I think our userland thread library (libc_r) has some bugs in
handling descriptors.  I can reproduce the behaviour on -current
and 4.x, and I believe it applies to 5.x too.  

Following is a description of the problem and some code to replicate it
The code includes a workaround but it is not particularly nice.

Any better ideas ? I am not sure on what to do, but perhaps the
only sensible thing to do is to add a note with this workaround
(or better ones, if available) to our pthreads manpage

--- PROBLEM DESCRIPTION ---

Basically, our libc_r keeps two views of i/o descriptors, one
(external) is for threads and reflects the modes requested by the
threads (blocking or not, etc.); the "internal" view instead is how
descriptors are actually set in the kernel -- and there they should
always be set as O_NONBLOCK to avoid blocking on a syscall.

The bug occurs when a process does a fork(), and then either
a close() or an exec() -- a similar thing also occurs with popen().
The relevant source code is in

    /usr/src/lib/libc_r/uthread/uthread_execve.c
    /usr/src/lib/libc_r/uthread/uthread_close.c

Right before the exec(), the internal descriptors are put into
blocking mode if the external one are blocking, and they are only
reset to O_NONBLOCK after termination of the child (upon SIGCHLD).
The same occurs for close(). 

Note that close() has hacks to leave pipes alone, but the same
code is not present in the execve() case where instead I believe
it would be necessary. Another thing to note is that there is
some kind of 'fate sharing' among the stdio descriptors (0, 1, 2)
which is not totally clear to me, but seems to require setting
O_NONBLOCK on all 3 to make sure that they are not changed to
blocking mode.

Because descriptors are shared between parent and child, for the
lifetime of the child descriptors in the parent will be blocking
and the scheduling of threads will be completely broken.

The only fix i have found is to act as follows:

        pipe(fd);       /* create a pipe with the child */
        p = fork();
        if (p == 0) { /* child */
            /* call fcntl() _before_ close() to avoid resetting
             * O_NONBLOCK on the internal descriptors. After that,
             * close the descriptors not needed in the child.
             */  
            for (i=0; i < getdtablesize(); i++) {
                long fl = fcntl(i, F_GETFL);
                if (fl != -1 && i != fd[0]) {
                    /* open and must be closed in the child */
                    fcntl(i, F_SETFL, O_NONBLOCK | fl);
                    close(i);
                }
            }
            /* standard stuff (dup2, exec*()... */
            dup2(fd[0], STDOUT_FILENO); /* as an example */
            execl(....);
        } else { /* parent */
            close(fd[0]);       /* close child end. */
            ...
        }

but of course this is rather unintuitive. On the other hand,
I have no idea of a better way to address the problem, and being
fairly new to threads programming maybe others know better.

I am attaching two minimal programs to demonstrate the bug.

simple.c is a simple program (linked against the regular C library)
	cc -o simple simple.c

that only plays with blocking mode on the descriptors.

thre.c is meant to be linked with libc_r.
	cc -o thre thre.c -lc_r

It does a fork and exec of the other program.
If you call it without arguments, it does not implement the
above workaround, and you see how the 'internal' descriptor
change to blocking mode. If you call it with an argument, it
implements the workaround.

	enjoy
	luigi


--+QahgC5+KEYLbs62
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="thre.c"

/*
 * test descriptor issues on threads.
 *
 * compile with cc -o thre -lc_r thre.c
 */

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>

int dump_desc(char *s, int w)
{
	int i;
        fprintf(stderr, "-- [pid %d thr %p] %s --\n", getpid(),
                pthread_self(), s);
	for (i=0; i<8; i++) {
		fprintf(stderr, "fd %d flags 0x%lx (system 0x%lx)\n", i,
			_thread_fd_getflags(i),
			__sys_fcntl(i, F_GETFL));
	}
	sleep(w);
        return 0;
}

int
main(int argc, char *argv[])
{
	pid_t p;
	int i, fd[2];

	pipe(fd);
	fprintf(stderr, "child-end %d    parent end %d max %d\n",
		fd[0], fd[1], getdtablesize());
	dump_desc("start main", 0);
	p = fork();
	if (p == 0) { /* child */
		/*
		 * close parent's end. It's a pipe so O_NONBLOCK remains.
		 * You can also do it in the loop below.
		 */
		close(fd[1]);
		/*
		 * First tell libc_r to leave O_NONBLOCK on the descriptors
		 * even after a close() or exec(), 
		 * _After_ that, close() all descriptors you don't need
		 * in the child, because they are shared and the child
		 * could change their mode in unexpected way causing us
		 * trouble.
		 * You can limit the loop (getdtablesize() is often large)
		 * but at least make sure to act on the descriptor you are
		 * using on the parent threads in blocking mode.
		 */ 
		if (argc > 1)
		    for (i=0; i < getdtablesize(); i++) {
			long fl = fcntl(i, F_GETFL);
			if (fl != -1 && i != fd[0]) {
				/* open and must be closed in the child */
				fcntl(i, F_SETFL, O_NONBLOCK | fl);
				close(i);
			}
		    }
		dup2(fd[0], STDOUT_FILENO);
		sleep(2);
		/*
		 * now we can finally exec a process without risking
		 * trouble. The process will only play with its own
		 * side of the pipes, which is not shared by the parent
		 * and so any action on it does not change the status
		 * on the parent side.
		 * The example process below does some weird things
		 * with the descriptors, and we use it to show that it
		 * does not harm us.
		 */
		execl("./simple", "simple", "2", NULL);
	} else {	/* parent */
		close(fd[0]);	/* close child end of the pipe */
		sleep(1);
		dump_desc("parent", 2);
		dump_desc("parent after exec done", 2);
		dump_desc("parent after child fcntl", 2);
		dump_desc("parent after child dead", 0);
	}
	return 0;
}

--+QahgC5+KEYLbs62
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="simple.c"

/*
 * test descriptor issues on threads.
 *
 * compile with cc -o simple simple.c
 */

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int
main(int argc, char *argv[])
{
	pid_t p;
	int fd[2];
	FILE *f;

	pipe(fd);
	sleep(atoi(argv[1]));
	dup2(fd[0], STDOUT_FILENO);
	fcntl(0, F_SETFL, ~O_NONBLOCK & fcntl(0, F_GETFL));
	fcntl(1, F_SETFL, ~O_NONBLOCK & fcntl(1, F_GETFL));
	fcntl(2, F_SETFL, ~O_NONBLOCK & fcntl(2, F_GETFL));
	sleep(atoi(argv[1]));
	return 0;
}

--+QahgC5+KEYLbs62--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050615025447.A62971>