Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Mar 2001 16:16:11 -0600
From:      Jonathan Lemon <jlemon@flugsvamp.com>
To:        Wietse Venema <wietse@porcupine.org>
Cc:        Jonathan Lemon <jlemon@flugsvamp.com>, itojun@iijlab.net, Arjan.deVet@adv.iae.nl, net@freebsd.org, postfix-users@postfix.org
Subject:   Re: [itojun@iijlab.net: accept(2) behavior with tcp RST right after handshake]
Message-ID:  <20010308161611.B78851@prism.flugsvamp.com>
In-Reply-To: <20010308180048.CC09DBC06D@spike.porcupine.org>
References:  <20010308095759.S41963@prism.flugsvamp.com> <20010308180048.CC09DBC06D@spike.porcupine.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 08, 2001 at 01:00:48PM -0500, Wietse Venema wrote:
> Jonathan Lemon:
> > On Thu, Mar 08, 2001 at 10:38:17AM -0500, Wietse Venema wrote:
> > > If the result of connect() write() close() depends on whether
> > > accept() happens after or before close(), then the behavior is
> > > broken. The client has received a successful return from write()
> > > and close(). The system is not supposed to lose the data, period.
> > 
> > What you seem to be missing here is that the behavior described
> > above is ONLY specific to UNIX-DOMAIN sockets.  The description
> > above is generally (but not always) true for the TCP/IP protocol.
> 
> The problem is observed with UNIX-domain sockets.
> 
> > Data CAN be lost if the TCP connection is RST.  It has nothing to
> > do with the ordering of accept() with respect to close().
> 
> Please educate me: how would RST come into this discussion at all?
> The client does connect() write() close(), there is no forced
> connection termination involved at all.

Under normal circumstances, a connect(), write(), close() call 
should work.  However, the code that was added was to handle the
abnormal cases from the server's point of view.

As you noted, this happened to break  for unix-domain sockets 
under 4.2-stable, because of the following kernel semantics bug:

    + with unix-domain sockets, the connection is marked as
      DISCONNECTED as soon as the final close() is performed.

    + with TCP/IP sockets, a connection is marked "DISCONNECTING"
      on the final client close, but is NOT actually closed (marked
      as DISCONNECTED) until the server is notified that client's
      TCP/IP endpoint is gone.

What we are trying to fix here is when the server, for some reason,
happens to see the client forcibly tear down the endpoint before it
can get around to to accepting the connection.

From the server's point of view:

    + TCP/IP handshake from client, allocate protocol control blocks
    + receive data from client
    + client resets connection, pcb is destroyed 

Exactly why the client resets the connection isn't my concern at 
the moment.  Some stacks may place a timeout on the FIN_WAIT state,
and forcibly reset the reset the connection when the timer expires.
Alternatively, the client may crash, and then RST in response to
an ACK transmitted by the server.  Or the other end may have set 
SO_LINGER, which will cause close() to send a RST.

The unix-domain bug is because we were treating sockets in the
DISCONNECTED state identically across all protocols, which turns
out not to be the case.

As for any data that already exists in the socket buffer on the
server when the connection is aborted, I believe that the correct
thing to do is discard it.  This is the historical precedent, and
is supported by the current standards.

Below is a patch that will fix the behavior for unix-domain sockets.
--
Jonathan


Index: kern/uipc_socket.c
===================================================================
RCS file: /ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.68.2.13
diff -u -r1.68.2.13 uipc_socket.c
--- kern/uipc_socket.c	2001/02/26 04:23:16	1.68.2.13
+++ kern/uipc_socket.c	2001/03/08 02:34:00
@@ -360,10 +360,7 @@
 	if ((so->so_state & SS_NOFDREF) == 0)
 		panic("soaccept: !NOFDREF");
 	so->so_state &= ~SS_NOFDREF;
- 	if ((so->so_state & SS_ISDISCONNECTED) == 0)
-		error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam);
-	else
-		error = ECONNABORTED;
+	error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam);
 	splx(s);
 	return (error);
 }
Index: netinet/tcp_usrreq.c
===================================================================
RCS file: /ncvs/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.51
diff -u -r1.51 tcp_usrreq.c
--- netinet/tcp_usrreq.c	2000/01/09 19:17:28	1.51
+++ netinet/tcp_usrreq.c	2001/03/08 16:21:28
@@ -417,6 +417,10 @@
 	struct inpcb *inp = sotoinpcb(so);
 	struct tcpcb *tp;
 
+	if (so->so_state & SS_ISDISCONNECTED) {
+		error = ECONNABORTED;
+		goto out;
+	}
 	COMMON_START();
 	in_setpeeraddr(so, nam);
 	COMMON_END(PRU_ACCEPT);
@@ -431,6 +435,10 @@
 	struct inpcb *inp = sotoinpcb(so);
 	struct tcpcb *tp;
 
+	if (so->so_state & SS_ISDISCONNECTED) {
+		error = ECONNABORTED;
+		goto out;
+	}
 	COMMON_START();
 	in6_mapped_peeraddr(so, nam);
 	COMMON_END(PRU_ACCEPT);

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010308161611.B78851>