Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Nov 2013 17:27:11 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>, freebsd-hackers@FreeBSD.org
Cc:        Adrian Chadd <adrian@FreeBSD.org>, Bernhard Schmidt <bschmidt@FreeBSD.org>, Luigi Rizzo <luigi@FreeBSD.org>, ray@FreeBSD.org, Andrew Thompson <thompsa@FreeBSD.org>, Pyun YongHyeon <yongari@FreeBSD.org>
Subject:   Re: taskqueue_block
Message-ID:  <5294BDCF.1050702@FreeBSD.org>
In-Reply-To: <201311211414.06849.jhb@freebsd.org>
References:  <5287BDB9.10201@FreeBSD.org> <528B7681.6090806@FreeBSD.org> <CAJ-Vmon5AuBDO8q3uddSnvqBTq71r9vW66DAk9oVpLKUUbX0mA@mail.gmail.com> <201311211414.06849.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
on 21/11/2013 21:14 John Baldwin said the following:
> On Tuesday, November 19, 2013 10:29:18 pm Adrian Chadd wrote:
>> Yes, and lets fix this. :)
> 
> Hmm, is taskqueue_block() always used in context where waiting is safe?

I reviewed usage of taskqueue_block() and it appears that in all cases except
for if_ath (which I'd like to discuss separately) the function is a part of the
anti-pattern taskqueue_block + taskqueue_drain.
That, according to my understanding, means that a) taskqueue_block() would be
safe to use if it were blocking and b) in many cases it should not be necessary
at all.

Here are some annotated patches:
================================================
--- a/sys/dev/jme/if_jme.c
+++ b/sys/dev/jme/if_jme.c
@@ -2259,6 +2259,9 @@ jme_link_task(void *arg, int pending)
 	jme_stop_rx(sc);
 	jme_stop_tx(sc);

+	/* Unblock execution of task. */
+	taskqueue_unblock(sc->jme_tq);
+
 	/* XXX Drain all queued tasks. */
 	JME_UNLOCK(sc);
 	taskqueue_drain(sc->jme_tq, &sc->jme_int_task);
@@ -2332,8 +2335,6 @@ jme_link_task(void *arg, int pending)
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 	callout_reset(&sc->jme_tick_ch, hz, jme_tick, sc);
-	/* Unblock execution of task. */
-	taskqueue_unblock(sc->jme_tq);
 	/* Reenable interrupts. */
 	CSR_WRITE_4(sc, JME_INTR_MASK_SET, JME_INTRS);

================================================
I am not sure if taskqueue_block is really needed in this driver.  I took a
safer assumption and decided that it could be needed while jme_stop_rx, etc are
being called.  But the taskqueue should not be block over a call to taskqueue_drain.

This patch could easily be wrong and some further reshuffling of the code could
be needed, but the patch shows the direction.
Particularly, I am concerned about possible interaction between
JME_LOCK/JME_UNLOCK and taskqueue_block.

================================================
--- a/sys/dev/netmap/if_em_netmap.h
+++ b/sys/dev/netmap/if_em_netmap.h
@@ -53,13 +53,10 @@ em_netmap_block_tasks(struct adapter *adapter)
 		struct rx_ring *rxr = adapter->rx_rings;

 		for (i = 0; i < adapter->num_queues; i++, txr++, rxr++) {
-			taskqueue_block(txr->tq);
 			taskqueue_drain(txr->tq, &txr->tx_task);
-			taskqueue_block(rxr->tq);
 			taskqueue_drain(rxr->tq, &rxr->rx_task);
 		}
 	} else {	/* legacy */
-		taskqueue_block(adapter->tq);
 		taskqueue_drain(adapter->tq, &adapter->link_task);
 		taskqueue_drain(adapter->tq, &adapter->que_task);
 	}
@@ -69,18 +66,6 @@ em_netmap_block_tasks(struct adapter *adapter)
 static void
 em_netmap_unblock_tasks(struct adapter *adapter)
 {
-	if (adapter->msix > 1) {
-		struct tx_ring *txr = adapter->tx_rings;
-		struct rx_ring *rxr = adapter->rx_rings;
-		int i;
-
-		for (i = 0; i < adapter->num_queues; i++) {
-			taskqueue_unblock(txr->tq);
-			taskqueue_unblock(rxr->tq);
-		}
-	} else { /* legacy */
-		taskqueue_unblock(adapter->tq);
-	}
 }


diff --git a/sys/dev/netmap/if_lem_netmap.h b/sys/dev/netmap/if_lem_netmap.h
index 25e5c7c..76c8246 100644
--- a/sys/dev/netmap/if_lem_netmap.h
+++ b/sys/dev/netmap/if_lem_netmap.h
@@ -60,7 +60,6 @@ lem_netmap_reg(struct ifnet *ifp, int onoff)
 	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);

 #ifndef EM_LEGACY_IRQ // XXX do we need this ?
-	taskqueue_block(adapter->tq);
 	taskqueue_drain(adapter->tq, &adapter->rxtx_task);
 	taskqueue_drain(adapter->tq, &adapter->link_task);
 #endif /* !EM_LEGCY_IRQ */
@@ -83,10 +82,6 @@ fail:
 		lem_init_locked(adapter);	/* also enable intr */
 	}

-#ifndef EM_LEGACY_IRQ
-	taskqueue_unblock(adapter->tq); // XXX do we need this ?
-#endif /* !EM_LEGCY_IRQ */
-
 	EM_CORE_UNLOCK(adapter);

 	return (error);
================================================
taskqueue_block that is immediately followed by taskqueue_drain is completely
redundant (in addition to being deadlock prone).  So this is an easy case.

================================================
--- a/sys/dev/rt/if_rt.c
+++ b/sys/dev/rt/if_rt.c
@@ -810,22 +810,17 @@ rt_stop_locked(void *priv)
 	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 	callout_stop(&sc->periodic_ch);
 	callout_stop(&sc->tx_watchdog_ch);
+
+	/* disable interrupts */
+	RT_WRITE(sc, GE_PORT_BASE + FE_INT_ENABLE, 0);
+
 	RT_SOFTC_UNLOCK(sc);
-	taskqueue_block(sc->taskqueue);

-	/*
-	 * Sometime rt_stop_locked called from isr and we get panic
-	 * When found, I fix it
-	 */
-#ifdef notyet
 	taskqueue_drain(sc->taskqueue, &sc->rx_done_task);
 	taskqueue_drain(sc->taskqueue, &sc->tx_done_task);
 	taskqueue_drain(sc->taskqueue, &sc->periodic_task);
-#endif
-	RT_SOFTC_LOCK(sc);

-	/* disable interrupts */
-	RT_WRITE(sc, GE_PORT_BASE + FE_INT_ENABLE, 0);
+	RT_SOFTC_LOCK(sc);

 	/* reset adapter */
 	RT_WRITE(sc, GE_PORT_BASE + FE_RST_GLO, PSE_RESET);
================================================

I am not sure why the calls to taskqueue_drain were commented out.  In my
opinion they are correct.  I was not able to find any possible way for
rt_stop_locked to be called from an ISR as the comment alleged.
I suspect that the problem was actually caused by the taskqueue_block +
taskqueue_drain deadlock.
Also, I think that disabling interrupts before calling taskqueue_drain should
ensure that no new tasks will be enqueued, but I could be wrong about this.

================================================
--- a/sys/net80211/ieee80211_proto.c
+++ b/sys/net80211/ieee80211_proto.c
@@ -1207,14 +1207,12 @@ update_chw(void *arg, int npending)
 void
 ieee80211_waitfor_parent(struct ieee80211com *ic)
 {
-	taskqueue_block(ic->ic_tq);
 	ieee80211_draintask(ic, &ic->ic_parent_task);
 	ieee80211_draintask(ic, &ic->ic_mcast_task);
 	ieee80211_draintask(ic, &ic->ic_promisc_task);
 	ieee80211_draintask(ic, &ic->ic_chan_task);
 	ieee80211_draintask(ic, &ic->ic_bmiss_task);
 	ieee80211_draintask(ic, &ic->ic_chw_task);
-	taskqueue_unblock(ic->ic_tq);
 }

 /*
================================================

Again, this is an obvious case of the wrong taskqueue_block usage.

I've CC-ed the developers who seem to have worked recently on the affected code.
I would like to ask you to review and/or test the changes.  My understanding of
the code is based on a quick glance, so I could be very wrong.  In that case, I
would like to discuss how the code actually works and what it expects to achieve
by using taskqueue_block.

Please see the quoted below text for my thoughts on why using taskqueue_block +
taskqueue_drain is a problem.

Thank you!

>> On 19 November 2013 06:32, Andriy Gapon <avg@freebsd.org> wrote:
>>>
>>> Forwarding this to the larger audience for a discussion.
>>>
>>> -------- Original Message --------
>>> Message-ID: <5287BDB9.10201@FreeBSD.org>
>>> Date: Sat, 16 Nov 2013 20:47:21 +0200
>>> From: Andriy Gapon <avg@FreeBSD.org>
>>> Subject: taskqueue_block
>>>
>>>
>>>
>>> It seems that either I do not understand something about taskqueue_block code or
>>> it is a quite dangerous and abused API.  The fact that it is not properly
>>> documented does not help either.
>>>
>>> The commit message said:
>>>> Implement taskqueue_block() and taskqueue_unblock(). These functions allow the
>>>> owner of a queue to block and unblock execution of the tasks in the queue while
>>>> allowing tasks to continue to be added queue. Combining this with
>>>> taskqueue_drain() allows a queue to be safely disabled. The unblock function may
>>> [...]
>>>
>>> I indeed see this (anti?) pattern being used in the code.
>>> But what about the following case.   One thread calls taskqueue_block() and sets
>>> TQ_FLAGS_BLOCKED.  Another thread calls taskqueue_enqueue, this adds a task to
>>> the queue and sets ta_pending of the task to 1.  tq_enqueue is not called, so an
>>> actual queue runner is not called or waken up.   Then the first thread calls
>>> taskqueue_drain() on the task.  As far as I can see, the thread would then just
>>> wait forever because the task is pending and is not going to be executed.
>>>
>>> Additionally, it is impossible to reason about the taskqueue's state after
>>> taskqueue_block call, because the call just sets the flag and does not do any
>>> synchronization.  And as described above, it is not safe to call APIs that could
>>> allow the taskqueue or the task state to become known.
>>>
>>> I think that taskqueue_block() should wait on the currently active tasks to
>>> complete.  I don't think that this behavior could be optional.  I do see any
>>> reasonable and safe use for "non-blocking" taskqueue_block().
>>> taskqueue_drain() calls after taskqueue_block() must be removed.  The code
>>> should either use taskqueue_drain() or "blocking" taskqueue_block() depending on
>>> concrete circumstances.
>>>
>>> What do you think?
>>> Thank you.


-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5294BDCF.1050702>