Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jan 2017 04:30:08 +0000 (UTC)
From:      Adrian Chadd <adrian@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r312661 - head/sys/dev/ath
Message-ID:  <201701230430.v0N4U8Ep010118@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: adrian
Date: Mon Jan 23 04:30:08 2017
New Revision: 312661
URL: https://svnweb.freebsd.org/changeset/base/312661

Log:
  [ath] fix thresholds for deciding to queue to the software queue and populate hardware frames
  
  This is two fixes, which establishes what I /think/ is pretty close to the
  theoretical PHY maximum speed on the AR9380 devices.
  
  * When doing A-MPDU on a TID, don't queue to the hardware directly if
    the hardware queue is busy.  This gives us time to get more packets
    queued up (and the hardware is busy, so there's no point in queuing
    more to the hardware right now) to potentially form an A-MPDU.
  
    This fixes up the throughput issue I was seeing where a couple hundred
    single frames were being sent a second interspersed between A-MPDU
    frames.  It just happened that the software queue had exactly one
    frame in it at that point.  Queuing it until the hardware finishes
    transmitting isn't exactly costly.
  
  * When determining whether to dequeue from a software node/TID queue into
    the hardware queue, fix up the checks to work right for EDMA chips
    (ar9380 and later.)   Before it was not dispatching anything until
    the FIFO was empty.  Now we allow it to dispatch another aggregate
    up to the hardware aggregate limit, like I intended with the earlier
    work.
  
  This allows a 5GHz HT40, short-GI, "htprotmode off" test at MCS23
  to achieve 357 Mbit/sec in a one-way UDP test.  The stars have to be
  aligned /just right/ so there are no retries but it can happen.
  Just don't expect it to work in an OTA test if your 2yo is running
  around the room - MCS23 is very very sensitive to channel conditions.
  
  Tested:
  
  * AR9380 STA (test) -> AR9580 hostap
  
  TODO:
  
  * More thorough testing on pre-AR9380 chips (AR5416, AR9160, AR9280)
  * (Finally) teach ath_rate_sample about throughput/latency rather than
    air time, so I can get good transmit rates with a 2yo running around.

Modified:
  head/sys/dev/ath/if_ath_tx.c

Modified: head/sys/dev/ath/if_ath_tx.c
==============================================================================
--- head/sys/dev/ath/if_ath_tx.c	Mon Jan 23 04:20:37 2017	(r312660)
+++ head/sys/dev/ath/if_ath_tx.c	Mon Jan 23 04:30:08 2017	(r312661)
@@ -3129,7 +3129,21 @@ ath_tx_swq(struct ath_softc *sc, struct 
 		ATH_TID_INSERT_TAIL(atid, bf, bf_list);
 		/* XXX sched? */
 	} else if (ath_tx_ampdu_running(sc, an, tid)) {
-		/* AMPDU running, attempt direct dispatch if possible */
+		/*
+		 * AMPDU running, queue single-frame if the hardware queue
+		 * isn't busy.
+		 *
+		 * If the hardware queue is busy, sending an aggregate frame
+		 * then just hold off so we can queue more aggregate frames.
+		 *
+		 * Otherwise we may end up with single frames leaking through
+		 * because we are dispatching them too quickly.
+		 *
+		 * TODO: maybe we should treat this as two policies - minimise
+		 * latency, or maximise throughput.  Then for BE/BK we can
+		 * maximise throughput, and VO/VI (if AMPDU is enabled!)
+		 * minimise latency.
+		 */
 
 		/*
 		 * Always queue the frame to the tail of the list.
@@ -3138,18 +3152,18 @@ ath_tx_swq(struct ath_softc *sc, struct 
 
 		/*
 		 * If the hardware queue isn't busy, direct dispatch
-		 * the head frame in the list.  Don't schedule the
-		 * TID - let it build some more frames first?
+		 * the head frame in the list.
 		 *
-		 * When running A-MPDU, always just check the hardware
-		 * queue depth against the aggregate frame limit.
-		 * We don't want to burst a large number of single frames
-		 * out to the hardware; we want to aggressively hold back.
+		 * Note: if we're say, configured to do ADDBA but not A-MPDU
+		 * then maybe we want to still queue two non-aggregate frames
+		 * to the hardware.  Again with the per-TID policy
+		 * configuration..)
 		 *
 		 * Otherwise, schedule the TID.
 		 */
 		/* XXX TXQ locking */
-		if (txq->axq_depth + txq->fifo.axq_depth < sc->sc_hwq_limit_aggr) {
+		if (txq->axq_depth + txq->fifo.axq_depth == 0) {
+
 			bf = ATH_TID_FIRST(atid);
 			ATH_TID_REMOVE(atid, bf, bf_list);
 
@@ -5615,19 +5629,40 @@ ath_txq_sched(struct ath_softc *sc, stru
 	ATH_TX_LOCK_ASSERT(sc);
 
 	/*
-	 * Don't schedule if the hardware queue is busy.
-	 * This (hopefully) gives some more time to aggregate
-	 * some packets in the aggregation queue.
-	 *
-	 * XXX It doesn't stop a parallel sender from sneaking
-	 * in transmitting a frame!
+	 * For non-EDMA chips, aggr frames that have been built are
+	 * in axq_aggr_depth, whether they've been scheduled or not.
+	 * There's no FIFO, so txq->axq_depth is what's been scheduled
+	 * to the hardware.
+	 *
+	 * For EDMA chips, we do it in two stages.  The existing code
+	 * builds a list of frames to go to the hardware and the EDMA
+	 * code turns it into a single entry to push into the FIFO.
+	 * That way we don't take up one packet per FIFO slot.
+	 * We do push one aggregate per FIFO slot though, just to keep
+	 * things simple.
+	 *
+	 * The FIFO depth is what's in the hardware; the txq->axq_depth
+	 * is what's been scheduled to the FIFO.
+	 *
+	 * fifo.axq_depth is the number of frames (or aggregates) pushed
+	 *  into the EDMA FIFO.  For multi-frame lists, this is the number
+	 *  of frames pushed in.
+	 * axq_fifo_depth is the number of FIFO slots currently busy.
 	 */
-	/* XXX TXQ locking */
-	if (txq->axq_aggr_depth + txq->fifo.axq_depth >= sc->sc_hwq_limit_aggr) {
+
+	/* For EDMA and non-EDMA, check built/scheduled against aggr limit */
+	if (txq->axq_aggr_depth >= sc->sc_hwq_limit_aggr) {
 		sc->sc_aggr_stats.aggr_sched_nopkt++;
 		return;
 	}
-	if (txq->axq_depth >= sc->sc_hwq_limit_nonaggr) {
+
+	/*
+	 * For non-EDMA chips, axq_depth is the "what's scheduled to
+	 * the hardware list".  For EDMA it's "What's built for the hardware"
+	 * and fifo.axq_depth is how many frames have been dispatched
+	 * already to the hardware.
+	 */
+	if (txq->axq_depth + txq->fifo.axq_depth >= sc->sc_hwq_limit_nonaggr) {
 		sc->sc_aggr_stats.aggr_sched_nopkt++;
 		return;
 	}



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201701230430.v0N4U8Ep010118>