From owner-freebsd-bugs@FreeBSD.ORG Fri Apr 20 00:30:11 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 295671065678 for ; Fri, 20 Apr 2012 00:30:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id EE9248FC0C for ; Fri, 20 Apr 2012 00:30:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q3K0UAUl036213 for ; Fri, 20 Apr 2012 00:30:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q3K0UAmf036212; Fri, 20 Apr 2012 00:30:10 GMT (envelope-from gnats) Resent-Date: Fri, 20 Apr 2012 00:30:10 GMT Resent-Message-Id: <201204200030.q3K0UAmf036212@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Adrian Chad Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AB460106566B for ; Fri, 20 Apr 2012 00:20:39 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 7C5618FC08 for ; Fri, 20 Apr 2012 00:20:39 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q3K0Kdkv026184 for ; Fri, 20 Apr 2012 00:20:39 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id q3K0KdWs026173; Fri, 20 Apr 2012 00:20:39 GMT (envelope-from nobody) Message-Id: <201204200020.q3K0KdWs026173@red.freebsd.org> Date: Fri, 20 Apr 2012 00:20:39 GMT From: Adrian Chad To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: misc/167113: [ath] AR5210: "stuck" TX seems to be occuring, without watchdog timeout firing X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Apr 2012 00:30:11 -0000 >Number: 167113 >Category: misc >Synopsis: [ath] AR5210: "stuck" TX seems to be occuring, without watchdog timeout firing >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Apr 20 00:30:10 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Adrian Chad >Release: 9.0-RELEASE i386, with -HEAD net80211/ath >Organization: >Environment: >Description: When using an AR5210 NIC and with -bgscan disabled, I've noticed that TX will occasionally hang. A 'scan' (which resets the NIC) will make things work again. A watchdog timeout isn't occuring, so the watchdog is being tickled somehow. However, the data TXQ shows 2-3 frames actually in the queue, as well as a number of frames being buffered in the software queue. The relevant dmesg output: (sysctl dev.ath.1.txagg=1) . . HW TXQ 0: axq_depth=2, axq_aggr_depth=0 HW TXQ 1: axq_depth=0, axq_aggr_depth=0 Total TX buffers: 77; Total TX buffers busy: 0 (here, ifconfig wlan1 scan) wlan1: [00:24:6c:04:ed:39] sta power save mode on ar5210: dma receive failed to stop in 10ms AR_CR=0x24 AR_DIAG_SW=0x40 ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: addbaw=0, dobaw=0, seqno_assign=0, seqno_required=0, seqno=-1, retry=0 ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: tid txq_depth=51 hwq_depth=0 ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: tid 16: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=0, baw_tail=0 txa_start=-1, ni_txseqs=45773 TODS 00:30:ab:17:81:47->00:1f:6c:9a:3f:1b(00:24:6c:04:ed:39) data 0M 0801 0000 0024 6c04 ed39 0030 ab17 8147 001f 6c9a 3f1b a029 aaaa 0300 0000 0800 4510 0034 fe95 4000 4006 a3ea c0a8 643c cb38 a816 28bf 0016 eae0 2e41 38e2 060a 8010 0401 383a 0000 0101 080a 158d 61be 067f a3a0 >How-To-Repeat: * Bring the AR5210 'up' * disable bgscan (ifconfig wlanX -bgscan) * Do some small amount of traffic (eg web, ssh) and see it occasionally hang * check the output of sysctl dev.ath.X.txagg=1 >Fix: I'm not sure. I don't know why frames are going into the software queue here - no aggregation has been negotiated, so in theory everything _should_ be being hardware queued. However, ath_tx_swq() is incorrectly checking the hardware queue depth against the sc_hwq_limit for non-aggregate traffic, and it's being software queued. So I -think- in this case, non-aggregate traffic is still being software queued _and_ only two frames are ever being queued to the hardware. That's likely very sub-optimal, but it's making this particular bug show its ugly head. What I need to check: * Are we somehow missing TX interrupts? (eg RAC style bugs) * There are frames in the hardware TXQ, so are they actually completed? I should turn on reset debugging (sysctl dev.ath.1.debug=0x20) and see what the descriptor dump looks like. If they're completed, a TX interrupt should've occured. * .. am I also getting TXEOL from the AR5210? That's how the TX interrupt mitigation technique is supposed to work. >Release-Note: >Audit-Trail: >Unformatted: