From owner-freebsd-net@freebsd.org  Fri Jul 15 04:21:50 2016
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1F472B98CE7;
 Fri, 15 Jul 2016 04:21:50 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from mail.turbocat.net (mail.turbocat.net
 [IPv6:2a01:4f8:d16:4514::2])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id AD76812EB;
 Fri, 15 Jul 2016 04:21:49 +0000 (UTC) (envelope-from hps@selasky.org)
Received: from laptop015.home.selasky.org (unknown [62.141.129.119])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.turbocat.net (Postfix) with ESMTPSA id 45F861FE024;
 Fri, 15 Jul 2016 06:21:46 +0200 (CEST)
Subject: Re: callout_drain either broken or man page needs updating
To: Matthew Macy <mmacy@nextbsd.org>,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>,
 "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>,
 Gleb Smirnoff <glebius@FreeBSD.org>
References: <155eca8bae0.d811ff9b567670.7363072028299444677@nextbsd.org>
From: Hans Petter Selasky <hps@selasky.org>
Message-ID: <087a20a1-2b8b-2050-c75f-78aac964b457@selasky.org>
Date: Fri, 15 Jul 2016 06:25:41 +0200
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101
 Thunderbird/45.0
MIME-Version: 1.0
In-Reply-To: <155eca8bae0.d811ff9b567670.7363072028299444677@nextbsd.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jul 2016 04:21:50 -0000

On 07/15/16 05:45, Matthew Macy wrote:
> glebius last commit needs some further re-work.

Hi,

Glebius commit needs to be backed out, at least the API change that 
changes the return value when calling callout_stop() when the callout is 
scheduled and being serviced. Simply because there is code out there, 
like Mattew and others have discovered that is "refcounting" on the 
callout_reset() and expecting that a subsequent callout_stop() will 
return 1 to "unref".

If you consider this impossible, maybe a fourth return value is needed 
for CANCELLED and DRAINING .

Further, getting the callouts straight in the TCP stack is a matter of 
doing the locking correctly, which some has called "my magic bullet" and 
not the return values. I've proposed in the following revision 
https://svnweb.freebsd.org/changeset/base/302768 to add a new callout 
API that accepts a locking function so that the callout code can run its 
cancelled checks at the right place for situations where more than one 
lock is needed.

Consider this case:

> void
> tcp_timer_2msl(void *xtp)
> {
>         struct tcpcb *tp = xtp;
>         struct inpcb *inp;
>         CURVNET_SET(tp->t_vnet);
> #ifdef TCPDEBUG
>         int ostate;
>
>         ostate = tp->t_state;
> #endif
>         INP_INFO_RLOCK(&V_tcbinfo);
>         inp = tp->t_inpcb;
>         KASSERT(inp != NULL, ("%s: tp %p tp->t_inpcb == NULL", __func__, tp));
>         INP_WLOCK(inp);
>         tcp_free_sackholes(tp);
>         if (callout_pending(&tp->t_timers->tt_2msl) ||
>             !callout_active(&tp->t_timers->tt_2msl)) {

Here we have custom in-house race check that doesn't affect the return 
value of callout_reset() nor callout_stop().

>                 INP_WUNLOCK(tp->t_inpcb);
>                 INP_INFO_RUNLOCK(&V_tcbinfo);
>                 CURVNET_RESTORE();
>                 return;


I propose the following solution:

>
> static void
> tcp_timer_2msl_lock(void *xtp, int do_lock)
> {
> 	struct tcpcb *tp = xtp;
> 	struct inpcb *inp;
>
> 	inp = tp->t_inpcb;
>
> 	if (do_lock) {
> 		CURVNET_SET(tp->t_vnet);
> 		INP_INFO_RLOCK(&V_tcbinfo);
> 		INP_WLOCK(inp);
> 	} else {
> 		INP_WUNLOCK(inp);
> 		INP_INFO_RUNLOCK(&V_tcbinfo);
> 		CURVNET_RESTORE();
> 	}
> }
>

callout_init_lock_function(&callout, &tcp_timer_2msl_lock, 
CALLOUT_RETURNUNLOCKED);

Then in softclock_call_cc() it will look like this:

>
> 	CC_UNLOCK(cc);
> 	if (c_lock != NULL) {
> 		if (have locking function)
> 			tcp_timer_2msl_lock(c_arg, 1);
> 		else
> 			class->lc_lock(c_lock, lock_status);
> 		/*
> 		 * The callout may have been cancelled
> 		 * while we switched locks.
> 		 */

Actually "CC_LOCK(cc)" should be in-front of cc_exec_cancel() to avoid 
races testing, setting and clearing this variable, like done in hps_head.

> 		if (cc_exec_cancel(cc, direct)) {
> 			if (have locking function)
> 				tcp_timer_2msl_lock(c_arg, 0);
> 			else
> 				class->lc_unlock(c_lock);
> 			goto skip;
>                }
 >                cc_exec_cancel(cc, direct) = true;
>
> ....
>
> skip:
> 	if ((c_iflags & CALLOUT_RETURNUNLOCKED) == 0) {
> 		if (have locking function)
> 			...
> 		else
> 			class->lc_unlock(c_lock);
> 	}

The whole point about this is to make the the cancelled check atomic.

1) Lock TCP
2) Lock CC_LOCK()
3) change callout state

--HPS