From owner-freebsd-net@FreeBSD.ORG Sat Nov 17 10:14:02 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17E1116A419 for ; Sat, 17 Nov 2007 10:14:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id 1AA1613C442 for ; Sat, 17 Nov 2007 10:14:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-219-213.carlnfd3.nsw.optusnet.com.au (c211-30-219-213.carlnfd3.nsw.optusnet.com.au [211.30.219.213]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id lAHADoNm025252 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 17 Nov 2007 21:13:56 +1100 Date: Sat, 17 Nov 2007 21:13:50 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Igor Sysoev In-Reply-To: <20071117071053.GA18091@rambler-co.ru> Message-ID: <20071117194615.L67319@delplex.bde.org> References: <20071116154019.GE93422@rambler-co.ru> <20071117065908.T65479@delplex.bde.org> <20071117071053.GA18091@rambler-co.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@FreeBSD.org Subject: Re: bge loader tunables X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Nov 2007 10:14:02 -0000 On Sat, 17 Nov 2007, Igor Sysoev wrote: > On Sat, Nov 17, 2007 at 08:30:58AM +1100, Bruce Evans wrote: > >> On Fri, 16 Nov 2007, Igor Sysoev wrote: >> >>> The attached patch creates the following bge loader tunables: >> >> I plan to commit old work to do this using sysctls. Tunables are >> harder to use and aren't needed since changes to the defaults aren't >> needed for booting. I also implemented dynamic tuning for rx coal >> parameters so that the sysctls are mostly not needed. Ask for patches >> if you want to test this extensively. > > Yes, I can test your patches on 6.2 and 7.0. > Now bge set the coalescing parameters at attach time. > Do the sysctl's allow to change them on-the-fly ? > How does rx dynamic tuning work ? > Could it be turned off ? OK, the patch is enclosed at the end, in 2 versions: - all my patches for bge (with lots of debugging cruft and half-baked fixes for 5705+ sysctls. - edited version with only the coalescing parameter changes. I haven't used it under 6.2, but have used a similar version in ~5.2, and it should work in 6.2 except for the 5705+ sysctl fixes. bge actually sets parameters at init time, and it initializes whenever the link is brought back up, so the parameters can be changed using "ifconfig bgeN down up". Several network drivers have interrupt moderation parameters that can be changed in this way, but it is painful to change the link status like that, so I have a sysctl dev.bge.N.program_coal to apply the current parameters to the hardware. The other sysctls to change the parameters don't apply immediately, except the one for the rx tuning max interrupt rate, since applying the changed parameters to the hardware takes more code than a SYSCTL_INT(), and it is sometimes necessary to change all the parameters together atomically. Dynamic tuning works by monitoring the current rx packet rate and increasing the active rx_max_coal_bds so that the ratio / rx_max_coal_bds is usually <= the specified max rx interrupt rate. rx_coal_ticks is set to the constant value of the inverse of the specified max rx interrupt rate (in ticks) on transition to dynamic mode but IIRC is not changed when the dynamic rate is changed (not always changing it automatically allows adjusting it independently of the rate but is often not what is wanted). The transition has some bias towards lower latency over too many interrupts, so that short bursts don't increase the latency. I think this simple algorithm is good enough provided the load (in rx packets/second) doesn't oscillate rapidly. Dynamic tuning requires efficient reprogramming of at least one of the hardware coal registers so that the tuning can respond rapidly to changes. I have 2 methods for this: - bge_careful_coal = 1 avoids using uses a potentially very long busy-wait loop in the interrupt handler by giving up on reprogramming the host coalescing engine (HCE) if the HCE seems to be busy. Docs seem to require waiting for up to several milliseconds for the HCE to stablilize, and it is not clear if it is possible for the HCE to never stabilize because packets are streaming in. (I don't have proper docs.) This seems to always work (the HCE is never busy) for rx_max_coal_bds, but something near here didn't work for changing rx_coal_ticks in an old version. - bge_careful_coal = 0 avoids the loop by writing to the rx_max_coal_bds register without waiting for the HCE. This seems to work too. It isn't critical for the HCE to see the change immediately or even for it to be seen at all (missed changes might do more than give a huge interrupt rate for too long), but it is important for the change to not break the engine. There is no sysctl for this of for some other hackish parameters. The source must be edited to change this from 1 to 0. Dynamic tuning is turned off by setting the dynamic max interrupt frequency to 0. Then rx_coal_ticks is reset to 150, and the active rx_max_coal_bds is restored to the static value. >>> hw.bge.tx_coal_desc=128 >>> >>> This value delays the generation of transmit interrupts until specified >>> number of packets will be transmited. The default value is 10. >> >> 128 is a good default. I use 384. There are few latency issues here, so >> the default of 10 mainly costs efficiency. > > Does 384 not delay tx if there is shortage of free tx descriptors ? No, it just increases the risk of the tx running dry by possibly not interrupting until there are only a few tx descriptors remaining in the hardware tx queue. Under load, the interrupt handler and/or bge_start() normally refills the queue to length 496 (512 less 16 for safety), and an interrupt arrives 384 descriptors later when the queue length has been reduced to 112. (My debugging sysctls show this behaviour clearly.) Then the interrupt must be handled (at least partially) within 112 descriptor times to avoid the tx running dry. This handling is usually possible. Even 480 works OK, but the throughput drops noticeably near that value. Under lighter loads, the queue is not completely refilled, but there is little chance of the tx running dry since OACTIVE is not set (the queue could only run dry despite there being data to be sent if unrelated system load prevents threads from running enough to top up the queue). Complete patch: --- % Index: if_bge.c % =================================================================== % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v % retrieving revision 1.198 % diff -u -2 -r1.198 if_bge.c % --- if_bge.c 30 Sep 2007 11:05:14 -0000 1.198 % +++ if_bge.c 8 Nov 2007 16:01:49 -0000 % @@ -1,2 +1,10 @@ % +int bge_careful_coal = 1; % +int bge_qlen = 1; % +int bge_errsrc = 0x17; % +int bge_rx_repl = 64; % +int bge_coal_writes; % +int bge_coal_write_fails; % +int bge_polling_trust_statusword = 0; % + % /*- % * Copyright (c) 2001 Wind River Systems % @@ -386,4 +394,5 @@ % * traps on certain architectures. % */ % +#define BGE_REGISTER_DEBUG % #ifdef BGE_REGISTER_DEBUG % static int bge_sysctl_debug_info(SYSCTL_HANDLER_ARGS); % @@ -427,4 +436,5 @@ % % static int bge_allow_asf = 1; % +static int bge_return_ring_cnt = BGE_RETURN_RING_CNT; /* XXX global. */ % % TUNABLE_INT("hw.bge.allow_asf", &bge_allow_asf); % @@ -867,10 +877,4 @@ % } % % -/* % - * The standard receive ring has 512 entries in it. At 2K per mbuf cluster, % - * that's 1MB or memory, which is a lot. For now, we fill only the first % - * 256 ring entries and hope that our CPU is fast enough to keep up with % - * the NIC. % - */ % static int % bge_init_rx_ring_std(struct bge_softc *sc) % @@ -878,8 +882,8 @@ % int i; % % - for (i = 0; i < BGE_SSLOTS; i++) { % + for (i = 0; i < BGE_STD_RX_RING_CNT; i++) { % if (bge_newbuf_std(sc, i, NULL) == ENOBUFS) % return (ENOBUFS); % - }; % + } % % bus_dmamap_sync(sc->bge_cdata.bge_rx_std_ring_tag, % @@ -922,5 +926,5 @@ % if (bge_newbuf_jumbo(sc, i, NULL) == ENOBUFS) % return (ENOBUFS); % - }; % + } % % bus_dmamap_sync(sc->bge_cdata.bge_rx_jumbo_ring_tag, % @@ -1426,5 +1430,5 @@ % val = 8; % else % - val = BGE_STD_RX_RING_CNT / 8; % + val = BGE_STD_RX_RING_CNT / 8, bge_rx_repl; % CSR_WRITE_4(sc, BGE_RBDI_STD_REPL_THRESH, val); % CSR_WRITE_4(sc, BGE_RBDI_JUMBO_REPL_THRESH, BGE_JUMBO_RX_RING_CNT/8); % @@ -1530,4 +1534,11 @@ % % /* Set up host coalescing defaults */ % + if (sc->bge_dyncoal_max_intr_freq != 0) { % + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / % + sc->bge_dyncoal_max_intr_freq; % + sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC / % + sc->bge_dyncoal_max_intr_freq; % + } else % + sc->bge_rx_coal_ticks = 150; % CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); % CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); % @@ -2226,4 +2237,53 @@ % % static int % +bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS) % +{ % + struct bge_softc *sc; % + int error, i, val; % + % + val = 0; % + error = sysctl_handle_int(oidp, &val, 0, req); % + if (error != 0 || req->newptr == NULL) % + return (error); % + sc = arg1; % + BGE_LOCK(sc); % + % + /* XXX cut from bge_blockinit(): */ % + % + /* Disable host coalescing until we get it set up */ % + CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000); % + % + /* Poll to make sure it's shut down. */ % + for (i = 0; i < BGE_TIMEOUT; i++) { % + if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE)) % + break; % + DELAY(10); % + } % + % + if (i == BGE_TIMEOUT) { % + device_printf(sc->bge_dev, % + "host coalescing engine failed to idle\n"); % + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); % + BGE_UNLOCK(sc); % + return (ENXIO); % + } % + % + /* Set up host coalescing defaults */ % + if (sc->bge_dyncoal_max_intr_freq != 0) % + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / % + sc->bge_dyncoal_max_intr_freq; % + CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); % + CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); % + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds); % + CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds); % + % + /* Turn on host coalescing state machine */ % + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); % + % + BGE_UNLOCK(sc); % + return (0); % +} % + % +static int % bge_attach(device_t dev) % { % @@ -2444,4 +2504,5 @@ % else % sc->bge_return_ring_cnt = BGE_RETURN_RING_CNT; % + bge_return_ring_cnt = sc->bge_return_ring_cnt; /* XXX */ % % if (bge_dma_alloc(dev)) { % @@ -2454,8 +2515,8 @@ % /* Set default tuneable values. */ % sc->bge_stat_ticks = BGE_TICKS_PER_SEC; % - sc->bge_rx_coal_ticks = 150; % - sc->bge_tx_coal_ticks = 150; % - sc->bge_rx_max_coal_bds = 10; % - sc->bge_tx_max_coal_bds = 10; % + sc->bge_dyncoal_max_intr_freq = 10000; % + sc->bge_tx_coal_ticks = 1000000; % + sc->bge_rx_max_coal_bds = 128; % + sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4; % % /* Set up ifnet structure */ % @@ -2473,5 +2534,9 @@ % ifp->if_init = bge_init; % ifp->if_mtu = ETHERMTU; % - ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1; % + if (bge_qlen & 1) % + ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT + % + imax(2 * tick, 10000) / 4; % + else % + ifp->if_snd.ifq_drv_maxlen = BGE_TX_RING_CNT - 1; % IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen); % IFQ_SET_READY(&ifp->if_snd); % @@ -2861,4 +2926,55 @@ % } % % +struct bgrstats { % + struct timeval enter; % + struct timeval exit; % + int cnt0; % + int cnt1; % +}; % + % +/* XXX globals without global locking, so don't enable for multiple bge's. */ % + % +static struct bgrstats bgrs[1024]; % + % +static int bgrse; % +SYSCTL_INT(_debug, OID_AUTO, bgrse, CTLFLAG_RW, % + &bgrse, 0, "bge rx stats enable"); % + % +static int bgrso; % +SYSCTL_INT(_debug, OID_AUTO, bgrso, CTLFLAG_RW, % + &bgrso, 0, "bge rx stats offset"); % + % +static int % +sysctl_bgrs(SYSCTL_HANDLER_ARGS) % +{ % + size_t len; % + int error, i, max; % + char buf[256]; % + % + for (i = 1, max = sizeof(bgrs) / sizeof(bgrs[0]); i < max; i++) { % + len = sprintf(buf, % + "%4ld %10ld.%06ld %3d %3ld %3d %10ld.%06ld %3d\n", % + (bgrs[i].enter.tv_sec - bgrs[i - 1].exit.tv_sec) * 1000000 + % + bgrs[i].enter.tv_usec - bgrs[i - 1].exit.tv_usec, % + (long)bgrs[i].enter.tv_sec, bgrs[i].enter.tv_usec, % + bgrs[i].cnt0, % + (bgrs[i].exit.tv_sec - bgrs[i].enter.tv_sec) * 1000000 + % + bgrs[i].exit.tv_usec - bgrs[i].enter.tv_usec, % + (bgrs[i].cnt1 - bgrs[i].cnt0 + bge_return_ring_cnt) % % + bge_return_ring_cnt, % + (long)bgrs[i].exit.tv_sec, bgrs[i].exit.tv_usec, % + bgrs[i].cnt1); % + if (i == max - 1) % + buf[len - 1] = '\0'; % + error = SYSCTL_OUT(req, buf, len); % + if (error != 0) % + return (error); % + } % + return (0); % +} % + % +SYSCTL_PROC(_debug, OID_AUTO, bgrs, CTLTYPE_STRING | CTLFLAG_RD, % + 0, 0, sysctl_bgrs, "A", "bge rx stats"); % + % /* % * Frame reception handling. This is called if there's a frame % @@ -2883,4 +2999,9 @@ % return; % % + if (bgrse) { % + microtime(&bgrs[bgrso].enter); % + bgrs[bgrso].cnt0 = sc->bge_rx_saved_considx; % + } % + % ifp = sc->bge_ifp; % % @@ -2953,5 +3074,8 @@ % stdcnt++; % if (cur_rx->bge_flags & BGE_RXBDFLAG_ERROR) { % + if (bge_errsrc & 1) % ifp->if_ierrors++; % + if (bge_errsrc & 8) % + printf("errflag %#x\n", cur_rx->bge_error_flag); % bge_newbuf_std(sc, sc->bge_std, m); % continue; % @@ -2959,4 +3083,5 @@ % if (bge_newbuf_std(sc, sc->bge_std, % NULL) == ENOBUFS) { % + if (bge_errsrc & 2) % ifp->if_ierrors++; % bge_newbuf_std(sc, sc->bge_std, m); % @@ -3036,6 +3161,60 @@ % ifp->if_ierrors += CSR_READ_4(sc, BGE_RXLP_LOCSTAT_IFIN_DROPS); % #endif % + % + if (bgrse) { % + bgrs[bgrso].cnt1 = sc->bge_rx_saved_considx; % + microtime(&bgrs[bgrso].exit); % + bgrso = (bgrso + 1) % (sizeof(bgrs) / sizeof(bgrs[0])); % + } % } % % +struct bgtstats { % + struct timeval enter; % + struct timeval exit; % + int cnt0; % + int cnt1; % +}; % + % +static struct bgtstats bgts[1024]; % + % +static int bgtse; % +SYSCTL_INT(_debug, OID_AUTO, bgtse, CTLFLAG_RW, % + &bgtse, 0, "bge tx stats enable"); % + % +static int bgtso; % +SYSCTL_INT(_debug, OID_AUTO, bgtso, CTLFLAG_RW, % + &bgtso, 0, "bge tx stats offset"); % + % +static int % +sysctl_bgts(SYSCTL_HANDLER_ARGS) % +{ % + size_t len; % + int error, i, max; % + char buf[256]; % + % + for (i = 1, max = sizeof(bgts) / sizeof(bgts[0]); i < max; i++) { % + len = sprintf(buf, % + "%4ld %10ld.%06ld %3d %3ld %3d %10ld.%06ld %3d\n", % + (bgts[i].enter.tv_sec - bgts[i - 1].exit.tv_sec) * 1000000 + % + bgts[i].enter.tv_usec - bgts[i - 1].exit.tv_usec, % + (long)bgts[i].enter.tv_sec, bgts[i].enter.tv_usec, % + bgts[i].cnt0, % + (bgts[i].exit.tv_sec - bgts[i].enter.tv_sec) * 1000000 + % + bgts[i].exit.tv_usec - bgts[i].enter.tv_usec, % + bgts[i].cnt0 - bgts[i].cnt1, % + (long)bgts[i].exit.tv_sec, bgts[i].exit.tv_usec, % + bgts[i].cnt1); % + if (i == max - 1) % + buf[len - 1] = '\0'; % + error = SYSCTL_OUT(req, buf, len); % + if (error != 0) % + return (error); % + } % + return (0); % +} % + % +SYSCTL_PROC(_debug, OID_AUTO, bgts, CTLTYPE_STRING | CTLFLAG_RD, % + 0, 0, sysctl_bgts, "A", "bge tx stats"); % + % static void % bge_txeof(struct bge_softc *sc) % @@ -3051,4 +3230,9 @@ % return; % % + if (bgtse) { % + microtime(&bgts[bgtso].enter); % + bgts[bgtso].cnt0 = sc->bge_txcnt; % + } % + % ifp = sc->bge_ifp; % % @@ -3085,4 +3269,10 @@ % if (sc->bge_txcnt == 0) % sc->bge_timer = 0; % + % + if (bgtse) { % + bgts[bgtso].cnt1 = sc->bge_txcnt; % + microtime(&bgts[bgtso].exit); % + bgtso = (bgtso + 1) % (sizeof(bgts) / sizeof(bgts[0])); % + } % } % % @@ -3103,6 +3293,12 @@ % sc->bge_cdata.bge_status_map, BUS_DMASYNC_POSTREAD); % % + /* XXX possible race on switching from interrupt mode. */ % statusword = atomic_readandclear_32( % &sc->bge_ldata.bge_status_block->bge_status); % + if (cmd != POLL_AND_CHECK_STATUS && bge_polling_trust_statusword && % + (statusword & BGE_STATFLAG_UPDATED) == 0) { % + BGE_UNLOCK(sc); % + return; % + } % % bus_dmamap_sync(sc->bge_cdata.bge_status_tag, % @@ -3134,8 +3330,24 @@ % struct bge_softc *sc; % struct ifnet *ifp; % - uint32_t statusword; % + uint32_t macstatus, statusword; % % sc = xsc; % % + /* % + * Quick check without locking or syncing. Since we don't ack the % + * interrupt when we return early, the hardware will repeat the % + * interrupt if we lose a race here. Later we will clear the % + * status, and that needs at least the lock. % + * % + * XXX sc->bge_link_evt and maybe the BCM5700 errata are not handled. % + * % + * XXX there is no good order for this check relative to the % + * IFCAP_POLLING one. Since I don't believe in polling, I optimized % + * for !polling. % + */ % + statusword = sc->bge_ldata.bge_status_block->bge_status; % + if ((statusword & BGE_STATFLAG_UPDATED) == 0) % + return; % + % BGE_LOCK(sc); % % @@ -3174,5 +3386,5 @@ % * Do the mandatory PCI flush as well as get the link status. % */ % - statusword = CSR_READ_4(sc, BGE_MAC_STS) & BGE_MACSTAT_LINK_CHANGED; % + macstatus = CSR_READ_4(sc, BGE_MAC_STS); % % /* Make sure the descriptor ring indexes are coherent. */ % @@ -3184,13 +3396,56 @@ % if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && % sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || % - statusword || sc->bge_link_evt) % + (macstatus & BGE_MACSTAT_LINK_CHANGED) || sc->bge_link_evt) % bge_link_upd(sc); % % if (ifp->if_drv_flags & IFF_DRV_RUNNING) { % - /* Check RX return ring producer/consumer. */ % bge_rxeof(sc); % - % - /* Check TX ring producer/consumer. */ % bge_txeof(sc); % + if (sc->bge_dyncoal_max_intr_freq != 0 && % + ++sc->bge_dyncoal_intrcnt == 16) { % + struct bintime bt; % + uint32_t dpi, pfrac, tfrac, xtime; % + % + binuptime(&bt); % + xtime = (bt.sec << 24) | (bt.frac >> 40); % + pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * % + sc->bge_dyncoal_scale; % + tfrac = xtime - sc->bge_dyncoal_xtime; % + sc->bge_dyncoal_rx_pps = % + (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * % + ((uint64_t)1 << 24) / tfrac; % + dpi = pfrac / (tfrac | 2) + 1; % + if (dpi > sc->bge_rx_max_coal_bds) % + dpi = sc->bge_rx_max_coal_bds; % + if (dpi != sc->bge_dyncoal_rx_max_coal_bds) { % + if (bge_careful_coal) { % + CSR_WRITE_4(sc, BGE_HCC_MODE, 0); % + CSR_READ_4(sc, BGE_HCC_MODE); % + if ((CSR_READ_4(sc, BGE_HCC_MODE) & % + BGE_HCCMODE_ENABLE) == 0) { % + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, % + dpi); % + sc->bge_dyncoal_rx_max_coal_bds = dpi; % + bge_coal_writes++; % + } else % + bge_coal_write_fails++; % + CSR_WRITE_4(sc, BGE_HCC_MODE, % + BGE_HCCMODE_ENABLE); % + } else { % + /* % + * XXX not waiting for the engine is needed % + * for efficiency since we reprogram it a % + * lot so as to react fast, and this seems % + * to work. However, similar reprogramming % + * of RX_COAL_TICKS doesn't work. % + */ % + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi); % + sc->bge_dyncoal_rx_max_coal_bds = dpi; % + } % + } % + sc->bge_dyncoal_xtime = xtime; % + sc->bge_dyncoal_intrcnt = 0; % + sc->bge_dyncoal_ipackets = ifp->if_ipackets; % + } % } % % @@ -3241,7 +3496,15 @@ % if ((sc->bge_flags & BGE_FLAG_TBI) == 0) { % mii = device_get_softc(sc->bge_miibus); % - /* Don't mess with the PHY in IPMI/ASF mode */ % - if (!((sc->bge_asf_mode & ASF_STACKUP) && (sc->bge_link))) % + /* Don't mess with the PHY unless link is down. */ % + if (!sc->bge_link) { % + if (bge_errsrc & 0x20) % + microtime(&bgrs[bgrso].enter); % + if (bge_errsrc & 0x10) % mii_tick(mii); % + if (bge_errsrc & 0x20) { % + microtime(&bgrs[bgrso].exit); % + bgrso = (bgrso + 1) % (sizeof(bgrs) / sizeof(bgrs[0])); % + } % + } % } else { % /* % @@ -3276,4 +3539,5 @@ % offsetof(struct bge_mac_stats_regs, etherStatsCollisions)); % % + if (bge_errsrc & 4) % ifp->if_ierrors += CSR_READ_4(sc, BGE_RXLP_LOCSTAT_IFIN_DROPS); % } % @@ -3298,4 +3562,5 @@ % % cnt = READ_STAT(sc, stats, ifInDiscards.bge_addr_lo); % + if (bge_errsrc & 4) % ifp->if_ierrors += (uint32_t)(cnt - sc->bge_rx_discards); % sc->bge_rx_discards = cnt; % @@ -4266,5 +4531,6 @@ % } % % -#define BGE_SYSCTL_STAT(sc, ctx, desc, parent, node, oid) \ % +/* XXX move this down and fix style bugs in it. */ % +#define BGE_SYSCTL_STAT_GEN(sc, ctx, desc, parent, node, oid) \ % SYSCTL_ADD_PROC(ctx, parent, OID_AUTO, oid, CTLTYPE_UINT|CTLFLAG_RD, \ % sc, offsetof(struct bge_stats, node), bge_sysctl_stats, "IU", \ % @@ -4281,4 +4547,27 @@ % children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev)); % % + SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal", % + CTLTYPE_INT | CTLFLAG_RW, % + sc, 0, bge_sysctl_program_coal, "I", % + "program bge coalescence values"); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW, % + &sc->bge_rx_coal_ticks, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW, % + &sc->bge_tx_coal_ticks, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW, % + &sc->bge_rx_max_coal_bds, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW, % + &sc->bge_tx_max_coal_bds, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq", % + CTLFLAG_RW, % + &sc->bge_dyncoal_max_intr_freq, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds", % + CTLFLAG_RD, % + &sc->bge_dyncoal_rx_max_coal_bds, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD, % + &sc->bge_dyncoal_rx_pps, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD, % + &sc->bge_dyncoal_scale, 0, ""); % + % #ifdef BGE_REGISTER_DEBUG % SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info", % @@ -4299,4 +4588,7 @@ % NULL, "BGE Statistics"); % schildren = children = SYSCTL_CHILDREN(tree); % + /* Most of these seem to be unavailable on 5705+. */ % +if (!BGE_IS_5705_PLUS(sc)) { % +#define BGE_SYSCTL_STAT BGE_SYSCTL_STAT_GEN % BGE_SYSCTL_STAT(sc, ctx, "Frames Dropped Due To Filters", % children, COSFramesDroppedDueToFilters, % @@ -4308,4 +4600,8 @@ % BGE_SYSCTL_STAT(sc, ctx, "NIC No More RX Buffer Descriptors", % children, nicNoMoreRxBDs, "NoMoreRxBDs"); % + /* % + * The next one seems to be in BGE_RXLP_LOCSTAT_IFIN_DROPS for % + * the 5705+ case -- bge_stats_update_regs() uses this. % + */ % BGE_SYSCTL_STAT(sc, ctx, "Discarded Input Frames", % children, ifInDiscards, "InputDiscards"); % @@ -4330,86 +4626,126 @@ % BGE_SYSCTL_STAT(sc, ctx, "NIC Send Threshold Hit", % children, nicSendThresholdHit, "SendThresholdHit"); % +} % % tree = SYSCTL_ADD_NODE(ctx, schildren, OID_AUTO, "rx", CTLFLAG_RD, % NULL, "BGE RX Statistics"); % children = SYSCTL_CHILDREN(tree); % + __asm("# label for testing ifHCInOctets"); % + /* % + * Most rx stats are available for the 5705+case, but in a % + * different layout and with different semantics (32 bit registers % + * holding 12 (?) bit values which are reset on write instead of % + * 64-bit registers). We only handle the layout differences, and % + * do that using extremely ugly macros. Resetting of the registers % + * currently makes this sysctl almost useless for the 5705+ ase. % + * % + * The mapping of registers into structs mostly just gets in the % + * way here. % + */ % +#define BGE_SYSCTL_STAT_RX(sc, ctx, desc, parent, node, oid) \ % + SYSCTL_ADD_PROC(ctx, parent, OID_AUTO, oid, \ % + CTLTYPE_UINT | CTLFLAG_RD, sc, \ % + BGE_IS_5705_PLUS(sc) ? \ % + offsetof(struct bge_mac_stats_regs, node) : \ % + offsetof(struct bge_stats, rxstats.node), \ % + bge_sysctl_stats, "IU", desc) % +#undef BGE_SYSCTL_STAT % +#define BGE_SYSCTL_STAT BGE_SYSCTL_STAT_RX % + % BGE_SYSCTL_STAT(sc, ctx, "Inbound Octets", % - children, rxstats.ifHCInOctets, "Octets"); % + children, ifHCInOctets, "Octets"); % BGE_SYSCTL_STAT(sc, ctx, "Fragments", % - children, rxstats.etherStatsFragments, "Fragments"); % + children, etherStatsFragments, "Fragments"); % BGE_SYSCTL_STAT(sc, ctx, "Inbound Unicast Packets", % - children, rxstats.ifHCInUcastPkts, "UcastPkts"); % + children, ifHCInUcastPkts, "UcastPkts"); % BGE_SYSCTL_STAT(sc, ctx, "Inbound Multicast Packets", % - children, rxstats.ifHCInMulticastPkts, "MulticastPkts"); % + children, ifHCInMulticastPkts, "MulticastPkts"); % BGE_SYSCTL_STAT(sc, ctx, "FCS Errors", % - children, rxstats.dot3StatsFCSErrors, "FCSErrors"); % + children, dot3StatsFCSErrors, "FCSErrors"); % BGE_SYSCTL_STAT(sc, ctx, "Alignment Errors", % - children, rxstats.dot3StatsAlignmentErrors, "AlignmentErrors"); % + children, dot3StatsAlignmentErrors, "AlignmentErrors"); % BGE_SYSCTL_STAT(sc, ctx, "XON Pause Frames Received", % - children, rxstats.xonPauseFramesReceived, "xonPauseFramesReceived"); % + children, xonPauseFramesReceived, "xonPauseFramesReceived"); % BGE_SYSCTL_STAT(sc, ctx, "XOFF Pause Frames Received", % - children, rxstats.xoffPauseFramesReceived, % - "xoffPauseFramesReceived"); % + children, xoffPauseFramesReceived, "xoffPauseFramesReceived"); % BGE_SYSCTL_STAT(sc, ctx, "MAC Control Frames Received", % - children, rxstats.macControlFramesReceived, % - "ControlFramesReceived"); % + children, macControlFramesReceived, "ControlFramesReceived"); % BGE_SYSCTL_STAT(sc, ctx, "XOFF State Entered", % - children, rxstats.xoffStateEntered, "xoffStateEntered"); % + children, xoffStateEntered, "xoffStateEntered"); % BGE_SYSCTL_STAT(sc, ctx, "Frames Too Long", % - children, rxstats.dot3StatsFramesTooLong, "FramesTooLong"); % + children, dot3StatsFramesTooLong, "FramesTooLong"); % BGE_SYSCTL_STAT(sc, ctx, "Jabbers", % - children, rxstats.etherStatsJabbers, "Jabbers"); % + children, etherStatsJabbers, "Jabbers"); % BGE_SYSCTL_STAT(sc, ctx, "Undersized Packets", % - children, rxstats.etherStatsUndersizePkts, "UndersizePkts"); % - BGE_SYSCTL_STAT(sc, ctx, "Inbound Range Length Errors", % + children, etherStatsUndersizePkts, "UndersizePkts"); % + /* The next 2 seem to be unavailable for the 5705 case. */ % +if (!BGE_IS_5705_PLUS(sc)) { % + BGE_SYSCTL_STAT_GEN(sc, ctx, "Inbound Range Length Errors", % children, rxstats.inRangeLengthError, "inRangeLengthError"); % - BGE_SYSCTL_STAT(sc, ctx, "Outbound Range Length Errors", % + BGE_SYSCTL_STAT_GEN(sc, ctx, "Outbound Range Length Errors", % children, rxstats.outRangeLengthError, "outRangeLengthError"); % +} % % tree = SYSCTL_ADD_NODE(ctx, schildren, OID_AUTO, "tx", CTLFLAG_RD, % NULL, "BGE TX Statistics"); % children = SYSCTL_CHILDREN(tree); % + __asm("# label for testing ifHCOutOctets"); % + /* % + * tx is like rx except the macro needs "txstats." instead of % + * ".rxstats" for the non-5705+ variant. Redefine it again % + * to get this. % + */ % +#define BGE_SYSCTL_STAT_TX(sc, ctx, desc, parent, node, oid) \ % + SYSCTL_ADD_PROC(ctx, parent, OID_AUTO, oid, \ % + CTLTYPE_UINT | CTLFLAG_RD, sc, \ % + BGE_IS_5705_PLUS(sc) ? \ % + offsetof(struct bge_mac_stats_regs, node) : \ % + offsetof(struct bge_stats, txstats.node), \ % + bge_sysctl_stats, "IU", desc) % +#undef BGE_SYSCTL_STAT % +#define BGE_SYSCTL_STAT BGE_SYSCTL_STAT_TX % + % BGE_SYSCTL_STAT(sc, ctx, "Outbound Octets", % - children, txstats.ifHCOutOctets, "Octets"); % + children, ifHCOutOctets, "Octets"); % BGE_SYSCTL_STAT(sc, ctx, "TX Collisions", % - children, txstats.etherStatsCollisions, "Collisions"); % + children, etherStatsCollisions, "Collisions"); % BGE_SYSCTL_STAT(sc, ctx, "XON Sent", % - children, txstats.outXonSent, "XonSent"); % + children, outXonSent, "XonSent"); % BGE_SYSCTL_STAT(sc, ctx, "XOFF Sent", % - children, txstats.outXoffSent, "XoffSent"); % - BGE_SYSCTL_STAT(sc, ctx, "Flow Control Done", % + children, outXoffSent, "XoffSent"); % +if (!BGE_IS_5705_PLUS(sc)) { % + BGE_SYSCTL_STAT_GEN(sc, ctx, "Flow Control Done", % children, txstats.flowControlDone, "flowControlDone"); % +} % BGE_SYSCTL_STAT(sc, ctx, "Internal MAC TX errors", % - children, txstats.dot3StatsInternalMacTransmitErrors, % + children, dot3StatsInternalMacTransmitErrors, % "InternalMacTransmitErrors"); % BGE_SYSCTL_STAT(sc, ctx, "Single Collision Frames", % - children, txstats.dot3StatsSingleCollisionFrames, % - "SingleCollisionFrames"); % + children, dot3StatsSingleCollisionFrames, "SingleCollisionFrames"); % BGE_SYSCTL_STAT(sc, ctx, "Multiple Collision Frames", % - children, txstats.dot3StatsMultipleCollisionFrames, % + children, dot3StatsMultipleCollisionFrames, % "MultipleCollisionFrames"); % BGE_SYSCTL_STAT(sc, ctx, "Deferred Transmissions", % - children, txstats.dot3StatsDeferredTransmissions, % - "DeferredTransmissions"); % + children, dot3StatsDeferredTransmissions, "DeferredTransmissions"); % BGE_SYSCTL_STAT(sc, ctx, "Excessive Collisions", % - children, txstats.dot3StatsExcessiveCollisions, % - "ExcessiveCollisions"); % + children, dot3StatsExcessiveCollisions, "ExcessiveCollisions"); % BGE_SYSCTL_STAT(sc, ctx, "Late Collisions", % - children, txstats.dot3StatsLateCollisions, % - "LateCollisions"); % + children, dot3StatsLateCollisions, "LateCollisions"); % BGE_SYSCTL_STAT(sc, ctx, "Outbound Unicast Packets", % - children, txstats.ifHCOutUcastPkts, "UcastPkts"); % + children, ifHCOutUcastPkts, "UcastPkts"); % BGE_SYSCTL_STAT(sc, ctx, "Outbound Multicast Packets", % - children, txstats.ifHCOutMulticastPkts, "MulticastPkts"); % + children, ifHCOutMulticastPkts, "MulticastPkts"); % BGE_SYSCTL_STAT(sc, ctx, "Outbound Broadcast Packets", % - children, txstats.ifHCOutBroadcastPkts, "BroadcastPkts"); % - BGE_SYSCTL_STAT(sc, ctx, "Carrier Sense Errors", % + children, ifHCOutBroadcastPkts, "BroadcastPkts"); % +if (!BGE_IS_5705_PLUS(sc)) { % + BGE_SYSCTL_STAT_GEN(sc, ctx, "Carrier Sense Errors", % children, txstats.dot3StatsCarrierSenseErrors, % "CarrierSenseErrors"); % - BGE_SYSCTL_STAT(sc, ctx, "Outbound Discards", % + BGE_SYSCTL_STAT_GEN(sc, ctx, "Outbound Discards", % children, txstats.ifOutDiscards, "Discards"); % - BGE_SYSCTL_STAT(sc, ctx, "Outbound Errors", % + BGE_SYSCTL_STAT_GEN(sc, ctx, "Outbound Errors", % children, txstats.ifOutErrors, "Errors"); % } % +} % % static int % @@ -4422,10 +4758,13 @@ % sc = (struct bge_softc *)arg1; % offset = arg2; % - if (BGE_IS_5705_PLUS(sc)) % + if (BGE_IS_5705_PLUS(sc)) { % base = BGE_MAC_STATS; % - else % + result = CSR_READ_4(sc, base + offset); % + } % + else { % base = BGE_MEMWIN_START + BGE_STATS_BLOCK; % - result = CSR_READ_4(sc, base + offset + offsetof(bge_hostaddr, % - bge_addr_lo)); % + result = CSR_READ_4(sc, base + offset + offsetof(bge_hostaddr, % + bge_addr_lo)); % + } % return (sysctl_handle_int(oidp, &result, 0, req)); % } % Index: if_bgereg.h % =================================================================== % RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v % retrieving revision 1.73 % diff -u -2 -r1.73 if_bgereg.h % --- if_bgereg.h 22 May 2007 19:22:58 -0000 1.73 % +++ if_bgereg.h 23 May 2007 09:12:50 -0000 % @@ -2338,13 +2338,7 @@ % % /* % - * Memory management stuff. Note: the SSLOTS, MSLOTS and JSLOTS % - * values are tuneable. They control the actual amount of buffers % - * allocated for the standard, mini and jumbo receive rings. % + * Memory management stuff. % */ % % -#define BGE_SSLOTS 256 % -#define BGE_MSLOTS 256 % -#define BGE_JSLOTS 384 % - % #define BGE_JRAWLEN (BGE_JUMBO_FRAMELEN + ETHER_ALIGN) % #define BGE_JLEN (BGE_JRAWLEN + (sizeof(uint64_t) - \ % @@ -2504,4 +2498,11 @@ % uint32_t bge_tx_discards; % uint32_t bge_tx_collisions; % + int bge_dyncoal_intrcnt; % + u_long bge_dyncoal_ipackets; % + uint32_t bge_dyncoal_max_intr_freq; % + uint32_t bge_dyncoal_rx_max_coal_bds; % + uint32_t bge_dyncoal_rx_pps; % + uint32_t bge_dyncoal_scale; % + uint32_t bge_dyncoal_xtime; % #ifdef DEVICE_POLLING % int rxcycles; --- Edited version (may have deleted too much or too little): --- % Index: if_bge.c % =================================================================== % RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v % retrieving revision 1.198 % diff -u -2 -r1.198 if_bge.c % --- if_bge.c 30 Sep 2007 11:05:14 -0000 1.198 % +++ if_bge.c 8 Nov 2007 16:01:49 -0000 % @@ -1,2 +1,10 @@ % +int bge_careful_coal = 1; % +int bge_qlen = 1; % +int bge_errsrc = 0x17; % +int bge_rx_repl = 64; % +int bge_coal_writes; % +int bge_coal_write_fails; % +int bge_polling_trust_statusword = 0; % + % /*- % * Copyright (c) 2001 Wind River Systems % @@ -867,10 +877,4 @@ % } % % -/* % - * The standard receive ring has 512 entries in it. At 2K per mbuf cluster, % - * that's 1MB or memory, which is a lot. For now, we fill only the first % - * 256 ring entries and hope that our CPU is fast enough to keep up with % - * the NIC. % - */ % static int % bge_init_rx_ring_std(struct bge_softc *sc) % @@ -878,8 +882,8 @@ % int i; % % - for (i = 0; i < BGE_SSLOTS; i++) { % + for (i = 0; i < BGE_STD_RX_RING_CNT; i++) { % if (bge_newbuf_std(sc, i, NULL) == ENOBUFS) % return (ENOBUFS); % - }; % + } % % bus_dmamap_sync(sc->bge_cdata.bge_rx_std_ring_tag, % @@ -1530,4 +1534,11 @@ % % /* Set up host coalescing defaults */ % + if (sc->bge_dyncoal_max_intr_freq != 0) { % + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / % + sc->bge_dyncoal_max_intr_freq; % + sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC / % + sc->bge_dyncoal_max_intr_freq; % + } else % + sc->bge_rx_coal_ticks = 150; % CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); % CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); % @@ -2226,4 +2237,53 @@ % % static int % +bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS) % +{ % + struct bge_softc *sc; % + int error, i, val; % + % + val = 0; % + error = sysctl_handle_int(oidp, &val, 0, req); % + if (error != 0 || req->newptr == NULL) % + return (error); % + sc = arg1; % + BGE_LOCK(sc); % + % + /* XXX cut from bge_blockinit(): */ % + % + /* Disable host coalescing until we get it set up */ % + CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000); % + % + /* Poll to make sure it's shut down. */ % + for (i = 0; i < BGE_TIMEOUT; i++) { % + if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE)) % + break; % + DELAY(10); % + } % + % + if (i == BGE_TIMEOUT) { % + device_printf(sc->bge_dev, % + "host coalescing engine failed to idle\n"); % + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); % + BGE_UNLOCK(sc); % + return (ENXIO); % + } % + % + /* Set up host coalescing defaults */ % + if (sc->bge_dyncoal_max_intr_freq != 0) % + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / % + sc->bge_dyncoal_max_intr_freq; % + CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); % + CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); % + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds); % + CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds); % + % + /* Turn on host coalescing state machine */ % + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); % + % + BGE_UNLOCK(sc); % + return (0); % +} % + % +static int % bge_attach(device_t dev) % { % @@ -2454,8 +2515,8 @@ % /* Set default tuneable values. */ % sc->bge_stat_ticks = BGE_TICKS_PER_SEC; % - sc->bge_rx_coal_ticks = 150; % - sc->bge_tx_coal_ticks = 150; % - sc->bge_rx_max_coal_bds = 10; % - sc->bge_tx_max_coal_bds = 10; % + sc->bge_dyncoal_max_intr_freq = 10000; % + sc->bge_tx_coal_ticks = 1000000; % + sc->bge_rx_max_coal_bds = 128; % + sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4; % % /* Set up ifnet structure */ % @@ -3184,13 +3396,56 @@ % if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && % sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || % - statusword || sc->bge_link_evt) % + statusword || sc->bge_link_evt) % bge_link_upd(sc); % % if (ifp->if_drv_flags & IFF_DRV_RUNNING) { % - /* Check RX return ring producer/consumer. */ % bge_rxeof(sc); % - % - /* Check TX ring producer/consumer. */ % bge_txeof(sc); % + if (sc->bge_dyncoal_max_intr_freq != 0 && % + ++sc->bge_dyncoal_intrcnt == 16) { % + struct bintime bt; % + uint32_t dpi, pfrac, tfrac, xtime; % + % + binuptime(&bt); % + xtime = (bt.sec << 24) | (bt.frac >> 40); % + pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * % + sc->bge_dyncoal_scale; % + tfrac = xtime - sc->bge_dyncoal_xtime; % + sc->bge_dyncoal_rx_pps = % + (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * % + ((uint64_t)1 << 24) / tfrac; % + dpi = pfrac / (tfrac | 2) + 1; % + if (dpi > sc->bge_rx_max_coal_bds) % + dpi = sc->bge_rx_max_coal_bds; % + if (dpi != sc->bge_dyncoal_rx_max_coal_bds) { % + if (bge_careful_coal) { % + CSR_WRITE_4(sc, BGE_HCC_MODE, 0); % + CSR_READ_4(sc, BGE_HCC_MODE); % + if ((CSR_READ_4(sc, BGE_HCC_MODE) & % + BGE_HCCMODE_ENABLE) == 0) { % + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, % + dpi); % + sc->bge_dyncoal_rx_max_coal_bds = dpi; % + bge_coal_writes++; % + } else % + bge_coal_write_fails++; % + CSR_WRITE_4(sc, BGE_HCC_MODE, % + BGE_HCCMODE_ENABLE); % + } else { % + /* % + * XXX not waiting for the engine is needed % + * for efficiency since we reprogram it a % + * lot so as to react fast, and this seems % + * to work. However, similar reprogramming % + * of RX_COAL_TICKS doesn't work. % + */ % + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi); % + sc->bge_dyncoal_rx_max_coal_bds = dpi; % + } % + } % + sc->bge_dyncoal_xtime = xtime; % + sc->bge_dyncoal_intrcnt = 0; % + sc->bge_dyncoal_ipackets = ifp->if_ipackets; % + } % } % % @@ -4281,4 +4547,27 @@ % children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev)); % % + SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal", % + CTLTYPE_INT | CTLFLAG_RW, % + sc, 0, bge_sysctl_program_coal, "I", % + "program bge coalescence values"); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW, % + &sc->bge_rx_coal_ticks, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW, % + &sc->bge_tx_coal_ticks, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW, % + &sc->bge_rx_max_coal_bds, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW, % + &sc->bge_tx_max_coal_bds, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq", % + CTLFLAG_RW, % + &sc->bge_dyncoal_max_intr_freq, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds", % + CTLFLAG_RD, % + &sc->bge_dyncoal_rx_max_coal_bds, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD, % + &sc->bge_dyncoal_rx_pps, 0, ""); % + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD, % + &sc->bge_dyncoal_scale, 0, ""); % + % #ifdef BGE_REGISTER_DEBUG % SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info", % Index: if_bgereg.h % =================================================================== % RCS file: /home/ncvs/src/sys/dev/bge/if_bgereg.h,v % retrieving revision 1.73 % diff -u -2 -r1.73 if_bgereg.h % --- if_bgereg.h 22 May 2007 19:22:58 -0000 1.73 % +++ if_bgereg.h 23 May 2007 09:12:50 -0000 % @@ -2338,13 +2338,7 @@ % % /* % - * Memory management stuff. Note: the SSLOTS, MSLOTS and JSLOTS % - * values are tuneable. They control the actual amount of buffers % - * allocated for the standard, mini and jumbo receive rings. % + * Memory management stuff. % */ % % -#define BGE_SSLOTS 256 % -#define BGE_MSLOTS 256 % -#define BGE_JSLOTS 384 % - % #define BGE_JRAWLEN (BGE_JUMBO_FRAMELEN + ETHER_ALIGN) % #define BGE_JLEN (BGE_JRAWLEN + (sizeof(uint64_t) - \ % @@ -2504,4 +2498,11 @@ % uint32_t bge_tx_discards; % uint32_t bge_tx_collisions; % + int bge_dyncoal_intrcnt; % + u_long bge_dyncoal_ipackets; % + uint32_t bge_dyncoal_max_intr_freq; % + uint32_t bge_dyncoal_rx_max_coal_bds; % + uint32_t bge_dyncoal_rx_pps; % + uint32_t bge_dyncoal_scale; % + uint32_t bge_dyncoal_xtime; % #ifdef DEVICE_POLLING % int rxcycles; --- Simple shell program for micro-adjusting parameters interactively (would be easier using a mouse, but I don't like GUI programming, and the parameter space is really too large to investigate manually): --- #!/bin/sh netstat=netstat rx_coal_ticks=$(sysctl -n dev.bge.0.rx_coal_ticks) rx_max_coal_bds=$(sysctl -n dev.bge.0.rx_max_coal_bds) tx_coal_ticks=$(sysctl -n dev.bge.0.tx_coal_ticks) tx_max_coal_bds=$(sysctl -n dev.bge.0.tx_max_coal_bds) max_intr_freq=$(sysctl -n dev.bge.0.dyncoal_max_intr_freq) drxbds=0 drxticks=0 dtxbds=0 dtxticks=0 while : do printf \ "rx ticks %d, rx bds %d, tx ticks %d, tx bds %d, freq %d, dyn bds %d\n" \ $rx_coal_ticks $rx_max_coal_bds $tx_coal_ticks $tx_max_coal_bds \ $max_intr_freq $(sysctl -n dev.bge.0.dyncoal_rx_max_coal_bds) # ($netstat -I bge0 1 | head -3 | tail -1) 2>/dev/null sysctl dev.bge.0.rx_coal_ticks=$rx_coal_ticks >/dev/null sysctl dev.bge.0.tx_coal_ticks=$tx_coal_ticks >/dev/null sysctl dev.bge.0.rx_max_coal_bds=$rx_max_coal_bds >/dev/null sysctl dev.bge.0.tx_max_coal_bds=$tx_max_coal_bds >/dev/null sysctl dev.bge.0.program_coal=0 >/dev/null read x case "$x" in 0) drxticks=0; drxbds=0; dtxticks=0; dtxbds=0 ;; H) drxticks=$(($drxticks - 1)) ;; J) drxbds=$(($drxbds - 1)) ;; K) drxbds=$(($drxbds + 1)) ;; L) drxticks=$(($drxticks + 1)) ;; h) dtxticks=$(($dtxticks - 1)) ;; j) dtxbds=$(($dtxbds - 1)) ;; k) dtxbds=$(($dtxbds + 1)) ;; l) dtxticks=$(($dtxticks + 1)) ;; n) ($netstat -I bge0 1 | head -3 | tail -1) 2>/dev/null esac rx_coal_ticks=$(($rx_coal_ticks + $drxticks)) rx_max_coal_bds=$(($rx_max_coal_bds + $drxbds)) tx_coal_ticks=$(($tx_coal_ticks + $dtxticks)) tx_max_coal_bds=$(($tx_max_coal_bds + $dtxbds)) done --- Bruce