Date: Fri, 19 Apr 2013 21:32:38 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: sbruno@FreeBSD.org Cc: "pyunyh@gmail.com" <pyunyh@gmail.com>, "freebsd-net@freebsd.org" <freebsd-net@FreeBSD.org>, bde <bde@FreeBSD.org>, Bruce Evans <brde@optusnet.com.au> Subject: Re: bge(4) sysctl tuneables -- a blast from the past. more knobs! MORE! Message-ID: <20130419203629.N1028@besplex.bde.org> In-Reply-To: <1366348388.1389.11.camel@localhost> References: <1365781568.1418.1.camel@localhost> <20130413200512.G1165@besplex.bde.org> <1366065356.1350.7.camel@localhost> <20130416152121.G904@besplex.bde.org> <1366348388.1389.11.camel@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 18 Apr 2013, Sean Bruno wrote: > Version 0.2 of patches to bge(4). I'm not totally happy with it, but > comments welcome. I need better explanations of usage for the man page. > > I've dropped bge_rxd completely here as it was suggested to not even > bother adjusting it. > > http://people.freebsd.org/~sbruno/bge_config_update_1.txt I dislike this. It is very bloated to have a SYSCTL_PROC() for every setting. Yet the SYSCTL_PROC()'s aren't even correct. They write directly to the device registers with no locking. Further bloat is given by a tunable backing every sysctl. Style bugs include verbose descriptions and obfuscation the strings for these. I prefer my version of course. It allows setting all the values using SYSCTL_INT()s. It is a feature that these settings are to memory variables only. This allows building up a complicated set of changes without having to ensure that all intermediate steps give a valid and/ or usable hardware setting at every stage. Then there is a SYSCTL_PROC() for the operation of transferring the values to the hardware. This sysctl isn't missing locking, and doesn't just write to the device registers, but reprograms the coal hardware like a device reset would. Direct writing to the registers would probably work except possibly for transient problems, and but the hardware doesn't seem to guarantee this and there are no benefits from current corners here. The little SYSCTL_PROC()s could be used to do range checks, but range checks would not be very useful and none are done now. Range checks of individual values as they are set would defeat building up complicated sets of changes in another way. Many combinations of values are invalid, but it is hard to say which, and it would be wrong to disallow intermediate invalid combinations that won't ever be used. All that can be done easily is to range check each parameter from its min to its max. This is not very useful, since combinations with everything min or everything max are probably just as dysfunctional as some slightly out of bounds values. >> There are only 2 bge tunables now, and they only have half of these bugs: >> - hw.bge.allow_asf is in the wrong namespace >> - hw.bge.allow_asf is too global > Maintainer disagrees with this. I think he agreed that this were wrong too. It is at least missing the verboseness (5 lines for this; 26 lines for each of 4 new sysctls). Anyway, the old bugs should be fixed separately. Here are my old coal patches for bge. They have been edited from a larger patch and may have editing errors. @ diff -c2 ./dev/bge/if_bge.c~ ./dev/bge/if_bge.c @ *** ./dev/bge/if_bge.c~ Fri May 16 16:39:01 2008 @ --- ./dev/bge/if_bge.c Thu Sep 30 19:51:38 2010 @ *************** @ *** 1,2 **** @ --- 1,10 ---- @ + int bge_careful_coal = 1; @ + int bge_qlen = 1; @ + int bge_errsrc = 0x17; @ + int bge_rx_repl = 64; @ + int bge_coal_writes; @ + int bge_coal_write_fails; @ + int bge_polling_trust_statusword = 0; @ + @ /*- @ * Copyright (c) 2001 Wind River Systems Debugging variables. bge_careful_coal selects a way of directly writing rx_max_coal_bds that is not completely sloppy though it doesn't busy-wait to synchronize. @ *************** @ *** 1661,1664 **** @ --- 1665,1675 ---- @ @ /* Set up host coalescing defaults */ @ + if (sc->bge_dyncoal_max_intr_freq != 0) { @ + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / @ + sc->bge_dyncoal_max_intr_freq; @ + sc->bge_rx_coal_ticks = BGE_TICKS_PER_SEC / @ + sc->bge_dyncoal_max_intr_freq; @ + } else @ + sc->bge_rx_coal_ticks = 150; @ CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); @ CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); @ *************** @ *** 2351,2354 **** @ --- 2362,2414 ---- @ @ static int @ + bge_sysctl_program_coal(SYSCTL_HANDLER_ARGS) @ + { @ + struct bge_softc *sc; @ + int error, i, val; @ + @ + val = 0; @ + error = sysctl_handle_int(oidp, &val, 0, req); @ + if (error != 0 || req->newptr == NULL) @ + return (error); @ + sc = arg1; @ + BGE_LOCK(sc); @ + @ + /* XXX cut from bge_blockinit(): */ @ + @ + /* Disable host coalescing until we get it set up */ @ + CSR_WRITE_4(sc, BGE_HCC_MODE, 0x00000000); @ + @ + /* Poll to make sure it's shut down. */ @ + for (i = 0; i < BGE_TIMEOUT; i++) { @ + if (!(CSR_READ_4(sc, BGE_HCC_MODE) & BGE_HCCMODE_ENABLE)) @ + break; @ + DELAY(10); @ + } @ + @ + if (i == BGE_TIMEOUT) { @ + device_printf(sc->bge_dev, @ + "host coalescing engine failed to idle\n"); @ + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); @ + BGE_UNLOCK(sc); @ + return (ENXIO); @ + } @ + @ + /* Set up host coalescing defaults */ @ + if (sc->bge_dyncoal_max_intr_freq != 0) @ + sc->bge_dyncoal_scale = ((uint64_t)1 << 24) / @ + sc->bge_dyncoal_max_intr_freq; @ + CSR_WRITE_4(sc, BGE_HCC_RX_COAL_TICKS, sc->bge_rx_coal_ticks); @ + CSR_WRITE_4(sc, BGE_HCC_TX_COAL_TICKS, sc->bge_tx_coal_ticks); @ + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, sc->bge_rx_max_coal_bds); @ + CSR_WRITE_4(sc, BGE_HCC_TX_MAX_COAL_BDS, sc->bge_tx_max_coal_bds); @ + @ + /* Turn on host coalescing state machine */ @ + CSR_WRITE_4(sc, BGE_HCC_MODE, BGE_HCCMODE_ENABLE); @ + @ + BGE_UNLOCK(sc); @ + return (0); @ + } @ + @ + static int @ bge_attach(device_t dev) @ { @ *************** @ *** 2584,2591 **** @ /* Set default tuneable values. */ @ sc->bge_stat_ticks = BGE_TICKS_PER_SEC; @ ! sc->bge_rx_coal_ticks = 150; @ ! sc->bge_tx_coal_ticks = 150; @ ! sc->bge_rx_max_coal_bds = 10; @ ! sc->bge_tx_max_coal_bds = 10; @ @ /* Set up ifnet structure */ @ --- 2645,2652 ---- @ /* Set default tuneable values. */ @ sc->bge_stat_ticks = BGE_TICKS_PER_SEC; @ ! sc->bge_dyncoal_max_intr_freq = 10000; @ ! sc->bge_tx_coal_ticks = 1000000; @ ! sc->bge_rx_max_coal_bds = 128; @ ! sc->bge_tx_max_coal_bds = BGE_TX_RING_CNT * 3 / 4; @ @ /* Set up ifnet structure */ @ *************** @ *** 3331,3343 **** @ if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && @ sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || @ ! statusword || sc->bge_link_evt) @ bge_link_upd(sc); @ @ if (ifp->if_drv_flags & IFF_DRV_RUNNING) { @ - /* Check RX return ring producer/consumer. */ @ bge_rxeof(sc); @ - @ - /* Check TX ring producer/consumer. */ @ bge_txeof(sc); @ } @ @ --- 3583,3638 ---- @ if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 && @ sc->bge_chipid != BGE_CHIPID_BCM5700_B2) || @ ! (macstatus & BGE_MACSTAT_LINK_CHANGED) || sc->bge_link_evt) @ bge_link_upd(sc); @ @ if (ifp->if_drv_flags & IFF_DRV_RUNNING) { @ bge_rxeof(sc); @ bge_txeof(sc); @ + if (sc->bge_dyncoal_max_intr_freq != 0 && @ + ++sc->bge_dyncoal_intrcnt == 16) { @ + struct bintime bt; @ + uint32_t dpi, pfrac, tfrac, xtime; @ + @ + binuptime(&bt); @ + xtime = (bt.sec << 24) | (bt.frac >> 40); @ + pfrac = (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * @ + sc->bge_dyncoal_scale; @ + tfrac = xtime - sc->bge_dyncoal_xtime; @ + sc->bge_dyncoal_rx_pps = @ + (ifp->if_ipackets - sc->bge_dyncoal_ipackets) * @ + ((uint64_t)1 << 24) / tfrac; @ + dpi = pfrac / (tfrac | 2) + 1; @ + if (dpi > sc->bge_rx_max_coal_bds) @ + dpi = sc->bge_rx_max_coal_bds; @ + if (dpi != sc->bge_dyncoal_rx_max_coal_bds) { @ + if (bge_careful_coal) { @ + CSR_WRITE_4(sc, BGE_HCC_MODE, 0); @ + CSR_READ_4(sc, BGE_HCC_MODE); @ + if ((CSR_READ_4(sc, BGE_HCC_MODE) & @ + BGE_HCCMODE_ENABLE) == 0) { @ + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, @ + dpi); @ + sc->bge_dyncoal_rx_max_coal_bds = dpi; @ + bge_coal_writes++; @ + } else @ + bge_coal_write_fails++; @ + CSR_WRITE_4(sc, BGE_HCC_MODE, @ + BGE_HCCMODE_ENABLE); @ + } else { @ + /* @ + * XXX not waiting for the engine is needed @ + * for efficiency since we reprogram it a @ + * lot so as to react fast, and this seems @ + * to work. However, similar reprogramming @ + * of RX_COAL_TICKS doesn't work. @ + */ @ + CSR_WRITE_4(sc, BGE_HCC_RX_MAX_COAL_BDS, dpi); @ + sc->bge_dyncoal_rx_max_coal_bds = dpi; @ + } @ + } @ + sc->bge_dyncoal_xtime = xtime; @ + sc->bge_dyncoal_intrcnt = 0; @ + sc->bge_dyncoal_ipackets = ifp->if_ipackets; @ + } @ } @ @ *************** @ *** 4450,4453 **** @ --- 4756,4782 ---- @ children = SYSCTL_CHILDREN(device_get_sysctl_tree(sc->bge_dev)); @ @ + SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "program_coal", @ + CTLTYPE_INT | CTLFLAG_RW, @ + sc, 0, bge_sysctl_program_coal, "I", @ + "program bge coalescence values"); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_coal_ticks", CTLFLAG_RW, @ + &sc->bge_rx_coal_ticks, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_coal_ticks", CTLFLAG_RW, @ + &sc->bge_tx_coal_ticks, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rx_max_coal_bds", CTLFLAG_RW, @ + &sc->bge_rx_max_coal_bds, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_max_coal_bds", CTLFLAG_RW, @ + &sc->bge_tx_max_coal_bds, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_max_intr_freq", @ + CTLFLAG_RW, @ + &sc->bge_dyncoal_max_intr_freq, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_max_coal_bds", @ + CTLFLAG_RD, @ + &sc->bge_dyncoal_rx_max_coal_bds, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_rx_pps", CTLFLAG_RD, @ + &sc->bge_dyncoal_rx_pps, 0, ""); @ + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "dyncoal_scale", CTLFLAG_RD, @ + &sc->bge_dyncoal_scale, 0, ""); @ + @ #ifdef BGE_REGISTER_DEBUG @ SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "debug_info", @ diff -c2 ./dev/bge/if_bgereg.h~ ./dev/bge/if_bgereg.h @ *** ./dev/bge/if_bgereg.h~ Fri May 16 16:39:21 2008 @ --- ./dev/bge/if_bgereg.h Fri May 16 16:39:23 2008 @ *************** @ *** 2579,2582 **** @ --- 2573,2583 ---- @ uint32_t bge_tx_discards; @ uint32_t bge_tx_collisions; @ + int bge_dyncoal_intrcnt; @ + u_long bge_dyncoal_ipackets; @ + uint32_t bge_dyncoal_max_intr_freq; @ + uint32_t bge_dyncoal_rx_max_coal_bds; @ + uint32_t bge_dyncoal_rx_pps; @ + uint32_t bge_dyncoal_scale; @ + uint32_t bge_dyncoal_xtime; @ #ifdef DEVICE_POLLING @ int rxcycles; The dynamical coal programming hasn't been mentioned recently, but really, it is the only useful thing in the above. Once the system does correct dynamic coal programming, there is no need for static sysctls except to reprove that they can't do such a good job as dynamic programming. I don't bother with dynamic coal program for tx. It would be possible to switch to more conservative tx settings under heavy rx load, and if you know too much about the device internals then balance all the resource uses so that they compete with each other as little as possible, but the possible gains from that are small. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130419203629.N1028>