From owner-freebsd-net@FreeBSD.ORG Thu Oct 12 19:01:49 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B7D3B16A403 for ; Thu, 12 Oct 2006 19:01:49 +0000 (UTC) (envelope-from bilse@networksignature.com) Received: from wraith.qbfox.com (wraith.qbfox.com [217.151.96.228]) by mx1.FreeBSD.org (Postfix) with ESMTP id 55F9C43D5F for ; Thu, 12 Oct 2006 19:01:32 +0000 (GMT) (envelope-from bilse@networksignature.com) Received: (from bilse@localhost) by wraith.qbfox.com (8.11.6/8.11.6) id k9CJ1Ul22300; Thu, 12 Oct 2006 20:01:30 +0100 Message-Id: <200610121901.k9CJ1Ul22300@wraith.qbfox.com> From: Per Gregers Bilse Date: Thu, 12 Oct 2006 20:01:30 +0100 Organization: Network Signature Ltd X-Mailer: Mail User's Shell (7.2.2 4/12/91) To: freebsd-net@freebsd.org Subject: bge(4) interface link state changes, bad flaps X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Oct 2006 19:01:49 -0000 [I guess I should introduce myself before posting, but I'm a bit pushed for time.] Some weeks ago I decided to upgrade to 6.x in an attempt to get rid of some unrelated problems, and ever since then I've seen fairly bad flaps on fibre Broadcom interfaces. I notice other people have had similar problems. The problem occurs with all if_bge.c versions less than maybe a year old (I haven't investigated in detail). Cutting a long story short, it seems that somewhere along the line some code accounting for encoding errors incorrectly showing up as link state changes has been lost. Try looking in older code (or using your favourite search engine) for "Sometimes PCS encoding errors are detected" (full text below). I tried re-importing the code into if_bge.c 1.150 but it didn't work as well as I had hoped (it did make a difference, though). I'm not sure if the problems are due to further bugs in my chips (BCM5703 rev 1002) or something else, but I did try new fibre connections without it making a difference, so I'm guessing it's the chips. Anyway, what has worked best for me is the fairly simple solution below, simply checking the status register for BGE_MACSTAT_TBI_SIGNAL_DETECT (checking for BGE_MACSTAT_TBI_PCS_SYNCHED still causes significant flaps) and resetting when the link goes down. static void bge_link_upd(struct bge_softc *sc) { struct mii_data *mii; uint32_t link, status; [...] if (sc->bge_flags & BGE_FLAG_TBI) { /* * Sometimes PCS encoding errors are detected in * TBI mode (on fiber NICs), and for some reason * the chip will signal them as link changes. * If we get a link change event, but the 'PCS * encoding error' bit in the MAC status register * is set, don't bother doing a link check. * This avoids spurious "link UP" messages * that sometimes appear on fiber NICs during * periods of heavy traffic. (There should be no * effect on copper NICs.) * * If we do have a copper NIC then * check that the AUTOPOLL bit is set before * processing the event as a real link change. * Turning AUTOPOLL on and off in the MII read/write * functions will often trigger a link status * interrupt for no reason. */ status = CSR_READ_4(sc, BGE_MAC_STS); if (!!(status & BGE_MACSTAT_TBI_SIGNAL_DETECT) != sc->bge_link) { if (status & BGE_MACSTAT_TBI_SIGNAL_DETECT) { sc->bge_link = 1; if (sc->bge_asicrev == BGE_ASICREV_BCM5704) BGE_CLRBIT(sc, BGE_MAC_MODE, BGE_MACMODE_TBI_SEND_CFGS); if (bootverbose) if_printf(sc->bge_ifp, "link UP\n"); if_link_state_change(sc->bge_ifp, LINK_STATE_UP); } else { sc->bge_link = 0; if (bootverbose) if_printf(sc->bge_ifp, "link DOWN\n"); if_link_state_change(sc->bge_ifp, LINK_STATE_DOWN); bge_ifmedia_upd_locked(sc->bge_ifp); } } /* Discard link events for MII/GMII cards if MI auto-polling disabled */ } else if (CSR_READ_4(sc, BGE_MI_MODE) & BGE_MIMODE_AUTOPOLL) { [...] } /* Clear the attention. */ CSR_WRITE_4(sc, BGE_MAC_STS, BGE_MACSTAT_SYNC_CHANGED| BGE_MACSTAT_CFG_CHANGED|BGE_MACSTAT_MI_COMPLETE| BGE_MACSTAT_PORT_DECODE_ERROR| BGE_MACSTAT_LINK_CHANGED); } Maybe not the finest solution, but it works much better than the stock code; fixes/thoughts/improvements welcome. Best, -- Per