Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Oct 2006 20:01:30 +0100
From:      Per Gregers Bilse <bilse@networksignature.com>
To:        freebsd-net@freebsd.org
Subject:   bge(4) interface link state changes, bad flaps
Message-ID:  <200610121901.k9CJ1Ul22300@wraith.qbfox.com>

next in thread | raw e-mail | index | archive | help
[I guess I should introduce myself before posting, but I'm a bit
pushed for time.]

Some weeks ago I decided to upgrade to 6.x in an attempt to get rid
of some unrelated problems, and ever since then I've seen fairly bad
flaps on fibre Broadcom interfaces.  I notice other people have had
similar problems.  The problem occurs with all if_bge.c versions less
than maybe a year old (I haven't investigated in detail).

Cutting a long story short, it seems that somewhere along the line some
code accounting for encoding errors incorrectly showing up as link state
changes has been lost.  Try looking in older code (or using your favourite
search engine) for "Sometimes PCS encoding errors are detected" (full text
below).

I tried re-importing the code into if_bge.c 1.150 but it didn't work as
well as I had hoped (it did make a difference, though).  I'm not sure if
the problems are due to further bugs in my chips (BCM5703 rev 1002) or
something else, but I did try new fibre connections without it making a
difference, so I'm guessing it's the chips.  Anyway, what has worked best
for me is the fairly simple solution below, simply checking the status
register for BGE_MACSTAT_TBI_SIGNAL_DETECT (checking for
BGE_MACSTAT_TBI_PCS_SYNCHED still causes significant flaps) and resetting
when the link goes down.

static void
bge_link_upd(struct bge_softc *sc)
{
	struct mii_data *mii;
	uint32_t link, status;
[...]
	if (sc->bge_flags & BGE_FLAG_TBI) {
		/*
		 * Sometimes PCS encoding errors are detected in
		 * TBI mode (on fiber NICs), and for some reason
		 * the chip will signal them as link changes.
		 * If we get a link change event, but the 'PCS
		 * encoding error' bit in the MAC status register
		 * is set, don't bother doing a link check.
		 * This avoids spurious "link UP" messages
		 * that sometimes appear on fiber NICs during
		 * periods of heavy traffic. (There should be no
		 * effect on copper NICs.)
		 *
		 * If we do have a copper NIC then
		 * check that the AUTOPOLL bit is set before
		 * processing the event as a real link change.
		 * Turning AUTOPOLL on and off in the MII read/write
		 * functions will often trigger a link status
		 * interrupt for no reason.
		 */
 		status = CSR_READ_4(sc, BGE_MAC_STS);
		if (!!(status & BGE_MACSTAT_TBI_SIGNAL_DETECT) !=
								sc->bge_link) {
			if (status & BGE_MACSTAT_TBI_SIGNAL_DETECT) {
				sc->bge_link = 1;
				if (sc->bge_asicrev == BGE_ASICREV_BCM5704)
					BGE_CLRBIT(sc, BGE_MAC_MODE,
					    BGE_MACMODE_TBI_SEND_CFGS);
				if (bootverbose)
					if_printf(sc->bge_ifp, "link UP\n");
				if_link_state_change(sc->bge_ifp,
				    			LINK_STATE_UP);
			} else {
				sc->bge_link = 0;
				if (bootverbose)
					if_printf(sc->bge_ifp, "link DOWN\n");
				if_link_state_change(sc->bge_ifp,
							LINK_STATE_DOWN);
				bge_ifmedia_upd_locked(sc->bge_ifp);
			}
		}
	/* Discard link events for MII/GMII cards if MI auto-polling disabled */
	} else if (CSR_READ_4(sc, BGE_MI_MODE) & BGE_MIMODE_AUTOPOLL) {
[...]
	}

	/* Clear the attention. */
	CSR_WRITE_4(sc, BGE_MAC_STS, BGE_MACSTAT_SYNC_CHANGED|
	    BGE_MACSTAT_CFG_CHANGED|BGE_MACSTAT_MI_COMPLETE|
	    BGE_MACSTAT_PORT_DECODE_ERROR|
	    BGE_MACSTAT_LINK_CHANGED);
}


Maybe not the finest solution, but it works much better than the stock code;
fixes/thoughts/improvements welcome.

Best,

  -- Per



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200610121901.k9CJ1Ul22300>