Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Aug 2011 14:26:37 -0700
From:      Devin Teske <devin.teske@fisglobal.com>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Cc:        devin.teske@fisglobal.com
Subject:   Broadcom BCM5780 Link-UP before auto-negotiation completes
Message-ID:  <03f101cc636d$b01bd1a0$105374e0$@fisglobal.com>

next in thread | raw e-mail | index | archive | help
Hi All,

I've got three different workstations each with a Broadcom Gigabit Ethernet card
(slightly different models, but all running the bge(4) device driver) on
FreeBSD-8.1 RELEASE.

We've having a strange problem where each/every single reboot ends up in
dropping to single-user mode because the NFS mounts fail in-turn because the
bge0 interface claims to be up but hasn't completed auto-negotation of the
link-speed yet (and states "no carrier").

After being dropped to single-user mode, you can press ENTER to accept the
default shell of /bin/sh and then type ^D to exit -- machine continues booting
just fine.

I've tried back-porting the recent changes from bge(4) in the
RELENG_8_2_0_RELEASE branch and even the RELENG_8 branch to no avail.

I was really disappointed because I could have sworn that one of these two SVN
revs (both published for RELENG_8_2_0_RELEASE) would have fixed the problem:

http://svnweb.freebsd.org/base?view=revision&revision=213808
Add more checks for resolved link speed in bge_miibus_statchg().
Link UP state could be reported first before actual completion of
auto-negotiation. This change makes bge(4) reprogram BGE_MAC_MODE,
BGE_TX_MODE and BGE_RX_MODE register only after controller got a
valid link.

http://svnweb.freebsd.org/base?view=revision&revision=213711
The IFF_DRV_RUNNING flag is set at the end of bge_init_locked. But
before setting the flag, interrupt was already enabled such that
interrupt handler could be run before setting IFF_DRV_RUNNING flag.
This can lose initial link state change interrupt which in turn
make bge(4) think that it still does not have valid link. Fix this
race by protecting the taskqueue with a driver lock.
While I'm here move reenabling interrupt code after handling of link
state change.

I'm afraid that our next recourse is going to be (in order of preference):

1. Try back-porting from an even further target (HEAD -> RELENG_8_1_0_RELEASE;
RELENG_8 wasn't high enough and bug still occurred).
2. Try firmware upgrade of the Broadcom controller
3. Write a custom rc.d script to detect when bge(4) is in use and sleep for a
few seconds before proceeding to NFS mounts

And if none of those work...

4. Unceremoniously rip bge(4) from our kernels to prevent usage in production --
requiring the installation of a PCI or PCI-e or PCI-X network card that doesn't
suffer this issue.

Suggestions welcome.
-- 
Devin

_____________

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?03f101cc636d$b01bd1a0$105374e0$>