From owner-freebsd-net@FreeBSD.ORG  Mon Apr  8 12:13:49 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 11219A4B
 for <freebsd-net@freebsd.org>; Mon,  8 Apr 2013 12:13:49 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id 6D81BA4B
 for <freebsd-net@freebsd.org>; Mon,  8 Apr 2013 12:13:48 +0000 (UTC)
Received: (qmail 71522 invoked from network); 8 Apr 2013 13:21:21 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <matt@matthewjmiller.net>; 8 Apr 2013 13:21:21 -0000
Message-ID: <5162B474.6060808@freebsd.org>
Date: Mon, 08 Apr 2013 14:13:40 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130307 Thunderbird/17.0.4
MIME-Version: 1.0
To: Matt Miller <matt@matthewjmiller.net>
Subject: Re: panic in tcp_do_segment()
References: <1043692819.529554.1365114790772.JavaMail.root@erie.cs.uoguelph.ca>
 <CAFc6gu9bmJQ-Uxv7PzspSLDyFz6zE3fV=u7HEumCh6f=YQLU2Q@mail.gmail.com>
In-Reply-To: <CAFc6gu9bmJQ-Uxv7PzspSLDyFz6zE3fV=u7HEumCh6f=YQLU2Q@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: FreeBSD Net <freebsd-net@freebsd.org>, Juan Mojica <jmojica@gmail.com>,
 Rick Macklem <rmacklem@uoguelph.ca>
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Apr 2013 12:13:49 -0000

On 05.04.2013 13:09, Matt Miller wrote:
> Hey Rick,
>
> I believe Juan and I have root caused this crash recently.  The t_state =
> 0x1, TCPS_LISTEN, in the link provided at the time of the assertion.
>
> In tcp_input(), if we're in TCPS_LISTEN, SO_ACCEPTCONN should be set on the
> socket and we should never enter tcp_do_segment() for this state.  I think
> if you look in your corefile, you'll see the socket *doesn't* have this
> flag set in your case.
>
> 1043         /*
> 1044          * When the socket is accepting connections (the INPCB is in
> LISTEN
> 1045          * state) we look into the SYN cache if this is a new
> connection
> 1046          * attempt or the completion of a previous one.  Because listen
> 1047          * sockets are never in TCPS_ESTABLISHED, the V_tcbinfo lock
> will be
> 1048          * held in this case.
> 1049          */
> 1050         if (so->so_options & SO_ACCEPTCONN) {
> 1051                 struct in_conninfo inc;
> 1052
> 1053                 KASSERT(tp->t_state == TCPS_LISTEN, ("%s: so accepting
> but "
> 1054                     "tp not listening", __func__));
> ...
> 1356                 syncache_add(&inc, &to, th, inp, &so, m, NULL, NULL);
> 1357                 /*
> 1358                  * Entry added to syncache and mbuf consumed.
> 1359                  * Everything already unlocked by syncache_add().
> 1360                  */
> 1361                 INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
> 1362                 return;
> 1363         }
> ...
> 1384         /*
> 1385          * Segment belongs to a connection in SYN_SENT, ESTABLISHED or
> later
> 1386          * state.  tcp_do_segment() always consumes the mbuf chain,
> unlocks
> 1387          * the inpcb, and unlocks pcbinfo.
> 1388          */
> 1389         tcp_do_segment(m, th, so, tp, drop_hdrlen, tlen, iptos,
> ti_locked);
>
> I think this has to do with this patch in soclose() where SO_ACCEPTCONN is
> being turned off in soclose().  I suspect if you look at the other threads
> in your corefile, you'll see one at this point in soclose() operating on
> the same socket as the one in the tcp_do_segment() thread.
>
> http://svnweb.freebsd.org/base?view=revision&revision=243627
>
>   817                 /*
>   818                  * Prevent new additions to the accept queues due
>   819                  * to ACCEPT_LOCK races while we are draining them.
>   820                  */
>   821                 so->so_options &= ~SO_ACCEPTCONN;
>   822                 while ((sp = TAILQ_FIRST(&so->so_incomp)) != NULL) {
>   823                         TAILQ_REMOVE(&so->so_incomp, sp, so_list);
>   824                         so->so_incqlen--;
>   825                         sp->so_qstate &= ~SQ_INCOMP;
>   826                         sp->so_head = NULL;
>   827                         ACCEPT_UNLOCK();
>   828                         soabort(sp);
>   829                         ACCEPT_LOCK();
>   830                 }
>
> Juan had evaluated this code path and it seemed safe to just drop the
> packet in this case:
>
> +     /*
> +      * In closing down the socket, the SO_ACCEPTCONN flag is removed to
> +      * prevent new connections from being established.  This means that
> +      * any frames in that were in the midst of being processed could
> +      * make it here.  Need to just drop the frame.
> +      */
> +     if (TCPS_LISTEN == tp->t_state) {
> +         TCPSTAT_INC(tcps_rcvwhileclosing);
> +         goto drop;
> +     }
>        KASSERT(tp->t_state > TCPS_LISTEN, ("%s: TCPS_LISTEN",
>            __func__));
>
> Or, if there's someone more familiar with the locking in these paths, they
> may be able to come up with a way to restructure the locks and logic to
> close this window.

Matt, Juan,

excellent analysis.  I don't see a better approach to handle this
under the current ACCEPT_LOCK model.

Compared to your patch I'd like to handle this race earlier before
we hit tcp_do_segment().

Could you please review the attached patch which handles it right
after the SO_ACCEPTCONN / syncache check?

-- 
Andre

Index: netinet/tcp_input.c
===================================================================
--- netinet/tcp_input.c	(revision 249253)
+++ netinet/tcp_input.c	(working copy)
@@ -1351,6 +1351,16 @@
  		 */
  		INP_INFO_UNLOCK_ASSERT(&V_tcbinfo);
  		return;
+	} else if (tp->t_state == TCPS_LISTEN) {
+		/*
+		 * When a listen socket is torn down the SO_ACCEPTCONN
+		 * flag is removed first while connections are drained
+		 * from the accept queue in a unlock/lock cycle of the
+		 * ACCEPT_LOCK, opening a race condition allowing a SYN
+		 * attempt go through unhandled.
+		 */
+		TCPSTAT_INC(tcps_rcvdwhileclosing);
+		goto drop;
  	}

  #ifdef TCP_SIGNATURE