From owner-freebsd-bugs@FreeBSD.ORG Fri May 10 10:00:00 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B7FDEAEB for ; Fri, 10 May 2013 10:00:00 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 9E7F7E4A for ; Fri, 10 May 2013 10:00:00 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r4AA00H4089418 for ; Fri, 10 May 2013 10:00:00 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r4AA00V0089417; Fri, 10 May 2013 10:00:00 GMT (envelope-from gnats) Resent-Date: Fri, 10 May 2013 10:00:00 GMT Resent-Message-Id: <201305101000.r4AA00V0089417@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, adrian chadd Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id D73F484C for ; Fri, 10 May 2013 09:50:20 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from oldred.FreeBSD.org (oldred.freebsd.org [8.8.178.121]) by mx1.freebsd.org (Postfix) with ESMTP id AFE7CDF0 for ; Fri, 10 May 2013 09:50:20 +0000 (UTC) Received: from oldred.FreeBSD.org ([127.0.1.6]) by oldred.FreeBSD.org (8.14.5/8.14.5) with ESMTP id r4A9oKuq091257 for ; Fri, 10 May 2013 09:50:20 GMT (envelope-from nobody@oldred.FreeBSD.org) Received: (from nobody@localhost) by oldred.FreeBSD.org (8.14.5/8.14.5/Submit) id r4A9oKXr091256; Fri, 10 May 2013 09:50:20 GMT (envelope-from nobody) Message-Id: <201305100950.r4A9oKXr091256@oldred.FreeBSD.org> Date: Fri, 10 May 2013 09:50:20 GMT From: adrian chadd To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Subject: kern/178477: [ath] missed beacon / soft reset in STA mode results in hardware error and DMA engine lockup X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 May 2013 10:00:00 -0000 >Number: 178477 >Category: kern >Synopsis: [ath] missed beacon / soft reset in STA mode results in hardware error and DMA engine lockup >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri May 10 10:00:00 UTC 2013 >Closed-Date: >Last-Modified: >Originator: adrian chadd >Release: -HEAD >Organization: >Environment: >Description: With my most recent changes in ath(4) to the TX DMA list (ie, only writing new TxDP entries for a queue for the first frame being sent after reset; then always using the holding descriptor and link pointer for subsequent frames) I've uncovered a rather annoying bug. If a no-loss reset is done (ie, no packets are lost) the hardware will end up locking up. This is triggerable in STA mode. AP mode doesn't (for now) seem to be a problem. What's seen: ath0: hardware error; resetting ath0: 0x00000000 0x00000020 0x00000000, 0x00000000 0x00000000 0x00000000 ar5416StopDmaReceive: dma failed to stop in 10ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 after this point, no combination of soft or hard chip reset unlocks the DMA engine. When reset debugging is enabled, the queue looks like this: ath0: ath_tx_stopdma: tx queue [3] 0, active=1, hwpending=1, flags 0x00000000, link 0x As far as I'm aware, the TX queue TxDP should never be 0x0 if it's active. Anyway. This is easy to reproduce. >How-To-Repeat: * Insert AR5416 card * Create STA vap * Associate to AP * Force a 'stuck beacon' no-loss reset - sysctl dev.ath.X.forcebstuck=1 * .. the next transmission will cause a hardware error. >Fix: Not sure yet. There's not many things that can go wrong here: * is there a frame on the TXQ that's actually already been freed? * is the holding descriptor not being freed during a soft reset? * .. and what about the link pointer? it should be set to NULL during reset, then the DMA restart routine should re-initialise the link pointer to the last descriptor in the last frame in the list. Or NULL, if the list is empty. Actually, I just hacked on the DMA restart code to ensure that the link pointer is either initialised to the last descriptor in the list or NULL. That seems to have fixed it. So, the reset path isn't freeing the holding descriptor or NULL'ing the axq_link pointer. Fix that! >Release-Note: >Audit-Trail: >Unformatted: