From owner-svn-src-user@FreeBSD.ORG Thu Sep 29 17:13:23 2011 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3C8B106566B; Thu, 29 Sep 2011 17:13:23 +0000 (UTC) (envelope-from adrian@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 940868FC08; Thu, 29 Sep 2011 17:13:23 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id p8THDN61032057; Thu, 29 Sep 2011 17:13:23 GMT (envelope-from adrian@svn.freebsd.org) Received: (from adrian@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id p8THDN09032055; Thu, 29 Sep 2011 17:13:23 GMT (envelope-from adrian@svn.freebsd.org) Message-Id: <201109291713.p8THDN09032055@svn.freebsd.org> From: Adrian Chadd Date: Thu, 29 Sep 2011 17:13:23 +0000 (UTC) To: src-committers@freebsd.org, svn-src-user@freebsd.org X-SVN-Group: user MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r225879 - user/adrian/if_ath_tx/sys/dev/ath X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Sep 2011 17:13:23 -0000 Author: adrian Date: Thu Sep 29 17:13:23 2011 New Revision: 225879 URL: http://svn.freebsd.org/changeset/base/225879 Log: Put in a temporary workaround for the strange rx behaviour I was seeing when RXEOL occured. Normally, I'd expect RXEOL to be followed by a dequeue of ATH_RXBUF number of packets (here 512 for 11n.) But I'd occasionally see this only service a handful at a time (and sometimes it'd service 0 frames!) Since the RX descriptor list should be consistent and correctly linked together after being processed, I gathered that something was interfering with this process (eg rxlink was being set to NULL.) I gathered that kickpcu and the interrupt mask update was racing. So to work around it (and it's stopped the issue appearing so far) : * clear kickpcu after the queue reset has been re-established and the RX has recommenced. This likely won't be very good for SMP machines as the taskqueue and ath_intr() calls can occur simultaneously on different CPUs. This means I'll have to eventually correctly lock this (or use atomics here.) * If a MIB interrupt also pops in at the same time and kickpcu is currently 1, don't reset the interrupt mask. This stops the RXEOL/RXORN interrupt from being triggered until we've handled and cleared the original case. As mentioned, this is only somewhat correct (read: not correct) for the single CPU case, and definitely still very racy for the SMP case. I'll need to come up with a better solution for this kickpcu stuff. Modified: user/adrian/if_ath_tx/sys/dev/ath/if_ath.c Modified: user/adrian/if_ath_tx/sys/dev/ath/if_ath.c ============================================================================== --- user/adrian/if_ath_tx/sys/dev/ath/if_ath.c Thu Sep 29 15:43:02 2011 (r225878) +++ user/adrian/if_ath_tx/sys/dev/ath/if_ath.c Thu Sep 29 17:13:23 2011 (r225879) @@ -1447,9 +1447,9 @@ ath_intr(void *arg) * is in the RX queue. * This will then kick the PCU. */ - taskqueue_enqueue_fast(sc->sc_tq, &sc->sc_rxtask); sc->sc_rxlink = NULL; sc->sc_kickpcu = 1; + taskqueue_enqueue_fast(sc->sc_tq, &sc->sc_rxtask); } if (status & HAL_INT_TXURN) { sc->sc_stats.ast_txurn++; @@ -1484,7 +1484,14 @@ ath_intr(void *arg) * clear whatever condition caused the interrupt. */ ath_hal_mibevent(ah, &sc->sc_halstats); - ath_hal_intrset(ah, sc->sc_imask); + /* + * Don't reset the interrupt if we've just + * kicked the PCU, or we may get a nested + * RXEOL before the rxproc has had a chance + * to run. + */ + if (sc->sc_kickpcu == 0) + ath_hal_intrset(ah, sc->sc_imask); } if (status & HAL_INT_RXORN) { /* NB: hal marks HAL_INT_FATAL when RXORN is fatal */ @@ -4057,24 +4064,34 @@ rx_next: * need to be handled, kick the PCU if there's * been an RXEOL condition. */ + /* + * XXX TODO! + * It is very likely that we're unfortunately + * racing with other places where ath_hal_intrset() + * may be called. It may be that we do need to + * add some more locking (eg the pcu lock from ath9k/ + * reference), or introduce some other way to cope + * with this. + */ if (sc->sc_kickpcu) { - sc->sc_kickpcu = 0; CTR0(ATH_KTR_ERR, "ath_rx_proc: kickpcu"); - /* - * XXX this causes a 3ms delay; and shuts down a lof - * XXX is it really needed? Or is it just enough to - * XXX kick the PCU again to continue RXing? - */ device_printf(sc->sc_dev, "%s: kickpcu; handled %d packets\n", __func__, npkts); + #if 0 - ath_stoprecv(sc); - sc->sc_imask |= (HAL_INT_RXEOL | HAL_INT_RXORN); + /* + * This re-links all of the descriptors together. + * (Is it possible that somehow, some state/issue + * is leaving us with badly linked descriptors?) + * Is it possible that we're receiving another RXEOL + * _during_ this function? + */ if (ath_startrecv(sc) != 0) { if_printf(ifp, "%s: couldn't restart RX after RXEOL; resetting\n", __func__); ath_reset(ifp); + sc->sc_kickpcu = 0; return; } #endif @@ -4087,6 +4104,7 @@ rx_next: ath_hal_startpcurecv(ah); /* re-enable PCU/DMA engine */ ath_hal_intrset(ah, sc->sc_imask); + sc->sc_kickpcu = 0; } if ((ifp->if_drv_flags & IFF_DRV_OACTIVE) == 0) {