From owner-freebsd-net@FreeBSD.ORG Fri Sep 24 17:44:50 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1F4C1065695; Fri, 24 Sep 2010 17:44:50 +0000 (UTC) (envelope-from tom@tomjudge.com) Received: from eu1sys200aog107.obsmtp.com (eu1sys200aog107.obsmtp.com [207.126.144.123]) by mx1.freebsd.org (Postfix) with SMTP id 41ACF8FC08; Fri, 24 Sep 2010 17:44:48 +0000 (UTC) Received: from source ([63.174.175.251]) by eu1sys200aob107.postini.com ([207.126.147.11]) with SMTP ID DSNKTJzjjuAVnnI3LGFQSd0xULCK4L2bQczo@postini.com; Fri, 24 Sep 2010 17:44:49 UTC Received: from [172.17.10.53] (unknown [172.17.10.53]) by bbbx3.usdmm.com (Postfix) with ESMTP id 2747EFD01D; Fri, 24 Sep 2010 17:44:46 +0000 (UTC) Message-ID: <4C9CE380.6020906@tomjudge.com> Date: Fri, 24 Sep 2010 12:44:32 -0500 From: Tom Judge User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.12) Gecko/20100915 Lightning/1.0b1 Thunderbird/3.0.8 MIME-Version: 1.0 To: David Christensen References: <4C894A76.5040200@tomjudge.com> <20100910002439.GO7203@michelle.cdnetworks.com> <4C8E3D79.6090102@tomjudge.com> <20100913184833.GF1229@michelle.cdnetworks.com> <4C8E768E.7000003@tomjudge.com> <20100913193322.GG1229@michelle.cdnetworks.com> <4C8E8BD1.5090007@tomjudge.com> <20100913205348.GJ1229@michelle.cdnetworks.com> <4C9B6CBD.2030408@tomjudge.com> <5D267A3F22FD854F8F48B3D2B52381933B5A78B484@IRVEXCHCCR01.corp.ad.broadcom.com> <4C9BA9FD.50406@tomjudge.com> <4C9BABA4.1060805@tomjudge.com> In-Reply-To: <4C9BABA4.1060805@tomjudge.com> X-Enigmail-Version: 1.0.1 Content-Type: multipart/mixed; boundary="------------000604030004000001090500" Cc: "pyunyh@gmail.com" , "freebsd-net@freebsd.org" , "yongari@freebsd.org" Subject: Re: bce(4) - com_no_buffers (Again) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Sep 2010 17:44:51 -0000 This is a multi-part message in MIME format. --------------000604030004000001090500 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 09/23/2010 02:33 PM, Tom Judge wrote: > The throttle command I am using in the tests is the one from here: > > http://klicman.org/throttle/ > > > On 09/23/2010 02:26 PM, Tom Judge wrote: > >> On 09/23/2010 01:21 PM, David Christensen wrote: >> >> >>>>>> Under testing I have yet to see a memory fragmentation issue with >>>>>> >>>>>> >>>>>> >>>> this >>>> >>>> >>>> >>>>>> driver. I follow up if/when I find a problem with this again. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> So here we are again. The system is locking up again because of 9k >>>> mbuf >>>> allocation failures. >>>> >>>> >>>> >>> Failure to allocate a new buffer should cause the driver to >>> drop the received frame and reuse the buffer, not lock up the >>> system. Are you seeing the lockup come from bce(4) or does >>> it come from somewhere else due to the dropped data? >>> >>> >>> >>> >> The lockup is not from the NIC as such, the systems have the appearance >> of locking up as home directories are on NFS and the user information is >> stored in a remote LDAP server. When the system starts to drop frames >> due to lack of 9k memory regions it tends to last for a few minutes >> (when it is really bad) and stop all traffic into the system. This >> appears to the average user as a complete system pause. >> >> >> >> >>>>>> Is there a way to fix the RX buffer shortage issues (when header >>>>>> splitting is turned on) so that they are guarded by flow control. >>>>>> >>>>>> >>>>>> >>>> Maybe >>>> >>>> >>>> >>>>>> change the low watermark for flow control when its enabled? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> I'm not sure how much it would help but try changing RX low >>>>> watermark. Default value is 32 which seems to be reasonable value. >>>>> But it's only for 5709/5716 controllers and Linux seems to use >>>>> different default value. >>>>> >>>>> >>>>> >>>>> >>>> These are: NetXtreme II BCM5709 Gigabit Ethernet >>>> >>>> So my next task is to turn the watermark related defines into sysctls >>>> and turn on header splitting so that I can try to tune them without >>>> having to reboot. >>>> >>>> >>>> >>>> >>> Do you have flow control enabled? There are arguments both for >>> and against flow control. For bce(4), I haven't tested flow control >>> for quite a while and it's behavior may have changed since it is >>> controlled by firmware. Keep an eye on the hardware statistics >>> to see that's it's actively generating pause frames. >>> >>> >>> >> 3) With flow control enabled and header splitting on flood the server >> with very small frames (200 bytes). (Using the same test as in case 1). >> My aim is to tune the watermark here so that there are no frames dropped >> due to BD shortages. >> >> Card info unhidden: bce0: ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.2); Flags (SPLT|MSI|MFW); MFW (NCSI 2.0.8) So having done lots of testing with flow control turned on as well as header splitting it seems like flow control may be broken with header splitting? I have been using the patch attached to play with the flow control water marks. I have tried with with following data points and am finding it difficult to get flow control to kick in before the card runs out of descriptors and starts dropping frames: low: 16 high: 127 low: 32 high: 127 low: 64 high: 127 low: 96 high: 127 low: 32 high: 196 low: 64 high: 196 low: 128 high: 256 None of these seem to have any noticeable or effect on the drop rate or the number of dev.bce.0.stat_FlowControlDone's in the sample period. Thoughs? Tom -- TJU13-ARIN --------------000604030004000001090500 Content-Type: text/plain; name="if_bce.patch.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="if_bce.patch.txt" Index: if_bce.c =================================================================== --- if_bce.c (revision 949) +++ if_bce.c (working copy) @@ -511,6 +511,21 @@ SYSCTL_UINT(_hw_bce, OID_AUTO, msi_enable, CTLFLAG_RDTUN, &bce_msi_enable, 0, "MSI-X|MSI|INTx selector"); + +/* Tunable RX flow control low water mark. */ +/* Without header splitting the default is 32 */ +static int bce_rx_low_water_mark = BCE_L2CTX_RX_LO_WATER_MARK_DEFAULT; +TUNABLE_INT("hw.bce.rx_low_water_mark", &bce_rx_low_water_mark); +SYSCTL_UINT(_hw_bce, OID_AUTO, rx_low_water_mark, CTLFLAG_RDTUN, &bce_rx_low_water_mark, 0, +"Default RX Flow Control Low Water Mark"); + +/* Tunable RX flow control high water mark. */ +/* Without header splitting the default is 32 */ +static int bce_rx_high_water_mark = USABLE_RX_BD / 4; +TUNABLE_INT("hw.bce.rx_high_water_mark", &bce_rx_high_water_mark); +SYSCTL_UINT(_hw_bce, OID_AUTO, rx_high_water_mark, CTLFLAG_RDTUN, &bce_rx_high_water_mark, 0, +"Default RX Flow Control High Water Mark"); + /* ToDo: Add tunable to enable/disable strict MTU handling. */ /* Currently allows "loose" RX MTU checking (i.e. sets the */ /* H/W RX MTU to the size of the largest receive buffer, or */ @@ -1780,11 +1795,15 @@ } if (mii->mii_media_active & IFM_FLAG1) { + BCE_PRINTF("%s(%d): Enabling TX flow control.\n", + __FILE__, __LINE__); DBPRINT(sc, BCE_INFO_PHY, "%s(): Enabling TX flow control.\n", __FUNCTION__); BCE_SETBIT(sc, BCE_EMAC_TX_MODE, BCE_EMAC_TX_MODE_FLOW_EN); sc->bce_flags |= BCE_USING_TX_FLOW_CONTROL; } else { + BCE_PRINTF("%s(%d): Disabling TX flow control.\n", + __FILE__, __LINE__); DBPRINT(sc, BCE_INFO_PHY, "%s(): Disabling TX flow control.\n", __FUNCTION__); BCE_CLRBIT(sc, BCE_EMAC_TX_MODE, BCE_EMAC_TX_MODE_FLOW_EN); @@ -5414,7 +5433,7 @@ u32 lo_water, hi_water; if (sc->bce_flags && BCE_USING_TX_FLOW_CONTROL) { - lo_water = BCE_L2CTX_RX_LO_WATER_MARK_DEFAULT; + lo_water = bce_rx_low_water_mark; } else { lo_water = 0; } @@ -5423,11 +5442,12 @@ lo_water = 0; } - hi_water = USABLE_RX_BD / 4; + hi_water = bce_rx_high_water_mark; if (hi_water <= lo_water) { lo_water = 0; } + BCE_PRINTF("Setting Up Flow Control (Pre Scaling), Low Watermark: %d, High Watermark: %d\n", (int)lo_water, (int)hi_water); lo_water /= BCE_L2CTX_RX_LO_WATER_MARK_SCALE; hi_water /= BCE_L2CTX_RX_HI_WATER_MARK_SCALE; @@ -5436,7 +5456,8 @@ hi_water = 0xf; else if (hi_water == 0) lo_water = 0; - + + BCE_PRINTF("Setting Up Flow Control (Post Scaling), Low Watermark: %d, High Watermark: %d\n", (int)lo_water, (int)hi_water); val |= (lo_water << BCE_L2CTX_RX_LO_WATER_MARK_SHIFT) | (hi_water << BCE_L2CTX_RX_HI_WATER_MARK_SHIFT); } --------------000604030004000001090500--