Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Mar 2005 17:31:12 +0100 (CET)
From:      Marc Olzheim <zlo@zlo.nu>, Sven Berkvens <sven@berkvens.net>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/78824: race condition close()ing and read()ing the same socketpair on SMP.
Message-ID:  <200503141631.j2EGVCH2035756@rave.ilse.net>
Resent-Message-ID: <200503141640.j2EGe3GQ036011@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         78824
>Category:       kern
>Synopsis:       race condition close()ing and read()ing the same socketpair on SMP.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 14 16:40:02 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Marc Olzheim, Sven Berkvens
>Release:        FreeBSD 5.4-PRERELEASE i386
>Organization:
ilse media
>Environment:
System: FreeBSD rave.ilse.net 5.4-PRERELEASE FreeBSD 5.4-PRERELEASE #0: Thu Mar 10 15:43:26 CET 2005 root@rave.ilse.net:/usr/obj/usr/src/sys/SE3DEBUG i386

GENERIC + INVARIANTS + INVARIANT_SUPPORT + WITNESS + WITNESS_SKIPSPIN

>Description:
	When read()ing from a socket while the other end is being
	close()d at the same time, read() fails with errno == ENOTCONN,
	instead of doing normal End-of-file handling.

	References:
	soisdisconnected() from
	__FBSDID("$FreeBSD: src/sys/kern/uipc_socket2.c,v 1.137.2.5 2005/02/23 00:39:17 rwatson Exp $");
	soreceive() from
	__FBSDID("$FreeBSD: src/sys/kern/uipc_socket.c,v 1.208.2.17 2005/03/07 13:08:03 rwatson Exp $");
	close() from
	__FBSDID("$FreeBSD: src/sys/kern/kern_descrip.c,v 1.243.2.6 2005/03/03 22:27:32 jhb Exp $");

	It seems as though soreceive() doesn't check for a lock on the
	filedescriptor, just the socket buffer, allowing close() to
	modify its flags at the same time.

>How-To-Repeat:
	Since this is heavily timing dependant (it is a race condition),
	it might not be easily reproduced. We can run our code on the
	following hardware, with no other CPU-time consuming processes
	running to reproduce it:

hw.machine: i386
hw.model: Intel(R) Xeon(TM) CPU 3.06GHz
hw.ncpu: 4
hw.byteorder: 1234
hw.clockrate: 3065
kern.ostype: FreeBSD
kern.osrelease: 5.4-PRERELEASE
kern.osrevision: 199506
kern.version: FreeBSD 5.4-PRERELEASE #0: Thu Mar 10 15:43:26 CET 2005
    root@rave.ilse.net:/usr/obj/usr/src/sys/SE3DEBUG

kern.clockrate: { hz = 100, tick = 10000, profhz = 1024, stathz = 128 }
kern.osreldate: 503105
kern.stackprot: 7
kern.ktrace.genio_size: 4096
kern.ktrace.request_pool: 100
kern.sched.name: 4BSD
kern.smp.maxcpus: 16
kern.smp.active: 1
kern.smp.disabled: 0
kern.smp.cpus: 4
kern.smp.forward_signal_enabled: 1
kern.smp.forward_roundrobin_enabled: 1

	Here's the code. I run under ktrace on our machine, the problem
	is reproduced:

rave:/tmp>echo 'ktrace -i ./socketpair2 < /dev/null' | sh
<Socket is not connected> (3,4) (i:33)
<Socket is not connected> (3,4) (i:48)
<Socket is not connected> (3,4) (i:67)
<Socket is not connected> (3,4) (i:99)
100
<Socket is not connected> (3,4) (i:131)
<Socket is not connected> (3,4) (i:141)
<Socket is not connected> (3,4) (i:144)
<Socket is not connected> (3,4) (i:159)
<Socket is not connected> (3,4) (i:169)
<Socket is not connected> (3,4) (i:176)
<Socket is not connected> (3,4) (i:183)
200
<Socket is not connected> (3,4) (i:213)
<Socket is not connected> (3,4) (i:226)
<Socket is not connected> (3,4) (i:234)
<Socket is not connected> (3,4) (i:254)
<Socket is not connected> (3,4) (i:282)
...

	socketpair2.c:

/* socketpair2.c: -	Marc Olzheim <zlo at zlo.nu>,
 *			Sven Berkvens <sven at berkvens.net>
 */
#include	<errno.h>
#include	<fcntl.h>
#include	<stdio.h>
#include	<string.h>
#include	<signal.h>
#include	<sys/socket.h>
#include	<sys/types.h>
#include	<sys/wait.h>
#include	<unistd.h>

int
main(int argc, char *argv[])
{
	int	sock[2], i, j, wstat;
	char	buf[1024];
	ssize_t	bytes;
	pid_t	newpid;

	if (1 != argc)
	{
		fprintf(stderr, "Usage: %s\n", argv[0]);
		return 1;
	}

	for (i = 0;;++i)
	{
		if (socketpair(PF_UNIX, SOCK_STREAM, 0, sock))
			perror("socketpair()");

		newpid = fork();
		if (-1 == newpid)
			perror("fork()");

		if (0 != newpid)
		{
			/* parent */
			close(sock[1]);

			if (write(sock[0], "A", 1) != 1)
				perror("write()");

			/* Suspend until the child has read the byte. */
			kill(getpid(), SIGSTOP);

			/* We hopefully get a time slice as soon as as a
			 * SIGCONT it delivered.
			 */
			close(sock[0]);
		}
		else
		{
			/* child */
			close(sock[0]);

			bytes = read(sock[1], buf, 1);
			if (bytes != 1)
				perror("first read()");

			/* Tell the parent to continue and close his side of
			 * the socket.
			 */
			kill(getppid(), SIGCONT);

			/* Since only 1 byte is send, this should
			 * produce EOF.
			 */
			bytes = read(sock[1], buf, 1);
			if (bytes == -1)
			{
				printf("<%s> (%d,%d) (i:%d)\n",
					strerror(errno),
					sock[0], sock[1], i);
				exit(1);
			}

			exit(0);
		}

		wait(&wstat);

		if (!(i % 100) && i)
			printf("%d\n", i);
	}

	return 0;
}

>Fix:

	It's possible to catch the ENOTCONN and restart the read() to to
	read the EOF...

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200503141631.j2EGVCH2035756>