Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Aug 2011 23:59:55 +0000 (UTC)
From:      Adrian Chadd <adrian@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-user@freebsd.org
Subject:   svn commit: r224952 - user/adrian/if_ath_tx/sys/dev/ath
Message-ID:  <201108172359.p7HNxt7K028186@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: adrian
Date: Wed Aug 17 23:59:55 2011
New Revision: 224952
URL: http://svn.freebsd.org/changeset/base/224952

Log:
  Close up any BAW races as much as possible; document the current problem
  and the workaround.
  
  In the first cut of this code, packet scheduling (ie, stuffing into a
  software or hardware TXQ) occured in the same context as the serialised
  TX from the net80211/netif code. When addba was set, the TID could
  be paused, and anything queued into the software queue would sit there
  and wait. Once the TID was unpaused, all those queued packets with
  sequence numbers would fall into an aggregate TID, and their sequence
  numbers would happily fall into the BAW.
  
  Now, packet scheduling occurs in the task context rather than the TX
  context. So when the TID is paused, any currently running packet
  scheduler/queue function in the ath task may be running concurrently.
  So it's possible that some packets would be dumped to the hardware
  before the TID pause value was checked.
  
  This commit (mostly) fixes the races in checking the TID paused flag.
  It doesn't check for races when forming aggregates; I'll worry about
  that later when I'm actively making aggregate frames. But as the
  TID isn't being locked for the entire duration of the packet scheduling
  and hardware queue, it's quite possible one will leak by between when
  TID->paused is set, and the next time it's checked.
  
  I'm also "sliding" the left edge of the BAW along to match whatever
  sequence numbers net80211 assigns between addba being setup and
  addba being actually enabled. The (likely) correct way to handle this
  is to queue frames during addba setup for this TID as aggregates without
  creating sequence numbers for them. If the TID pause is handled correctly,
  this should occur.
  
  This needs to be revisited and fixed before the code is merged into
  -HEAD.

Modified:
  user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c

Modified: user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c
==============================================================================
--- user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c	Wed Aug 17 20:55:56 2011	(r224951)
+++ user/adrian/if_ath_tx/sys/dev/ath/if_ath_tx.c	Wed Aug 17 23:59:55 2011	(r224952)
@@ -1104,7 +1104,7 @@ ath_tx_start(struct ath_softc *sc, struc
 	/* Don't do it whilst pending; the net80211 layer still assigns them */
 	if (is_ampdu_tx) {
 		/*
-		 * Always set a seqno; this function will
+		 * Always call; this function will
 		 * handle making sure that null data frames
 		 * don't get a sequence number from the current
 		 * TID and thus mess with the BAW.
@@ -1676,6 +1676,9 @@ ath_tx_tid_sched(struct ath_softc *sc, s
 
 	ATH_TXQ_LOCK_ASSERT(txq);
 
+	if (atid->paused)
+		return;		/* paused, can't schedule yet */
+
 	if (atid->sched)
 		return;		/* already scheduled */
 
@@ -2564,11 +2567,24 @@ ath_tx_tid_hw_queue_aggr(struct ath_soft
 
 	for (;;) {
 		ATH_TXQ_LOCK(atid);
+
+		/*
+		 * If the upper layer has paused the TID, don't
+		 * queue any further packets.
+		 *
+		 * This can also occur from the completion task because
+		 * of packet loss; but as its serialised with this code,
+		 * it won't "appear" half way through queuing packets.
+		 */
+		if (atid->paused)
+			break;
+
                 bf = STAILQ_FIRST(&atid->axq_q);
 		if (bf == NULL) {
 			ATH_TXQ_UNLOCK(atid);
 			break;
 		}
+
 		DPRINTF(sc, ATH_DEBUG_SW_TX, "%s: bf=%p: tid=%d\n",
 		    __func__, bf, bf->bf_state.bfs_tid);
 		if (bf->bf_state.bfs_tid != tid)
@@ -2656,11 +2672,20 @@ ath_tx_tid_hw_queue_norm(struct ath_soft
 
 	for (;;) {
 		ATH_TXQ_LOCK(atid);
+
+		/*
+		 * If the upper layers have paused the TID, don't
+		 * queue any further packets.
+		 */
+		if (atid->paused)
+			break;
+
 		bf = STAILQ_FIRST(&atid->axq_q);
 		if (bf == NULL) {
 			ATH_TXQ_UNLOCK(atid);
 			break;
 		}
+
 		ATH_TXQ_REMOVE_HEAD(atid, bf_list);
 		ATH_TXQ_UNLOCK(atid);
 
@@ -2817,9 +2842,32 @@ ath_addba_request(struct ieee80211_node 
 	struct ath_node *an = ATH_NODE(ni);
 	struct ath_tid *atid = &an->an_tid[tid];
 
-	DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, "%s: called\n", __func__);
+	/*
+	 * XXX This isn't enough.
+	 *
+	 * The taskqueue may be running and scheduling some more packets.
+	 * It acquires the TID lock to serialise access to the TID paused
+	 * flag but as the rest of the code doesn't hold the TID lock
+	 * for the duration of any activity (outside of adding/removing
+	 * items from the software queue), it can't possibly guarantee
+	 * consistency.
+	 *
+	 * This pauses future scheduling, but it doesn't interrupt the
+	 * current scheduling, nor does it wait for that scheduling to
+	 * finish. So the txseq window has moved, and those frames
+	 * in the meantime have "normal" completion handlers.
+	 *
+	 * The addba teardown pause/resume likely has the same problem.
+	 */
 	ath_tx_tid_pause(sc, atid);
 
+	DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL,
+	    "%s: called; dialogtoken=%d, baparamset=%d, batimeout=%d\n",
+	    __func__, dialogtoken, baparamset, batimeout);
+	DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL,
+	    "%s: txa_start=%d, ni_txseqs=%d\n",
+	    __func__, tap->txa_start, ni->ni_txseqs[tid]);
+
 	return sc->sc_addba_request(ni, tap, dialogtoken, baparamset,
 	    batimeout);
 }
@@ -2831,6 +2879,18 @@ ath_addba_request(struct ieee80211_node 
  *
  * Any packets TX'ed from this point should be "aggregate" (whether
  * aggregate or not) so the BAW is updated.
+ *
+ * Note! net80211 keeps self-assigning sequence numbers until
+ * ampdu is negotiated. This means the initially-negotiated BAW left
+ * edge won't match the ni->ni_txseq.
+ *
+ * So, being very dirty, the BAW left edge is "slid" here to match
+ * ni->ni_txseq.
+ *
+ * What likely SHOULD happen is that all packets subsequent to the
+ * addba request should be tagged as aggregate and queued as non-aggregate
+ * frames; thus updating the BAW. For now though, I'll just slide the
+ * window.
  */
 int
 ath_addba_response(struct ieee80211_node *ni, struct ieee80211_tx_ampdu *tap,
@@ -2842,6 +2902,14 @@ ath_addba_response(struct ieee80211_node
 	struct ath_tid *atid = &an->an_tid[tid];
 	int r;
 
+	DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL,
+	    "%s: called; status=%d, code=%d, batimeout=%d\n", __func__,
+	    status, code, batimeout);
+
+	DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL,
+	    "%s: txa_start=%d, ni_txseqs=%d\n",
+	    __func__, tap->txa_start, ni->ni_txseqs[tid]);
+
 	/*
 	 * Call this first, so the interface flags get updated
 	 * before the TID is unpaused. Otherwise a race condition
@@ -2850,7 +2918,13 @@ ath_addba_response(struct ieee80211_node
 	 */
 	r = sc->sc_addba_response(ni, tap, status, code, batimeout);
 
-	DPRINTF(sc, ATH_DEBUG_SW_TX_CTRL, "%s: called\n", __func__);
+	/*
+	 * XXX dirty!
+	 * Slide the BAW left edge to wherever net80211 left it for us.
+	 * Read above for more information.
+	 */
+	tap->txa_start = ni->ni_txseqs[tid];
+
 	ath_tx_tid_resume(sc, atid);
 	return r;
 }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201108172359.p7HNxt7K028186>