From owner-freebsd-questions@FreeBSD.ORG Thu Aug 25 21:27:19 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3ECC4106566C for ; Thu, 25 Aug 2011 21:27:19 +0000 (UTC) (envelope-from Devin.Teske@fisglobal.com) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) by mx1.freebsd.org (Postfix) with ESMTP id 09C168FC19 for ; Thu, 25 Aug 2011 21:27:18 +0000 (UTC) Received: from sbhfislrext01.fnfis.com ([192.168.249.167]) by SCSFISLTC02 (8.14.3/8.14.3) with ESMTP id p7PLRIBg012237 for ; Thu, 25 Aug 2011 16:27:18 -0500 Received: from sbhfisltcgw01.FNFIS.COM (Not Verified[10.132.248.121]) by sbhfislrext01.fnfis.com with MailMarshal (v6, 5, 4, 7535) id ; Thu, 25 Aug 2011 16:27:12 -0500 Received: from smtp.fisglobal.com ([10.132.206.31]) by sbhfisltcgw01.FNFIS.COM with Microsoft SMTPSVC(6.0.3790.4675); Thu, 25 Aug 2011 16:27:17 -0500 Received: from dtwin (10.14.152.43) by smtp.fisglobal.com (10.132.206.31) with Microsoft SMTP Server (TLS) id 14.1.289.1; Thu, 25 Aug 2011 16:27:13 -0500 From: Devin Teske To: FreeBSD Questions Date: Thu, 25 Aug 2011 14:26:37 -0700 Message-ID: <03f101cc636d$b01bd1a0$105374e0$@fisglobal.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: Acxjav6zlUUhgItuQGaBb7gryn0sjQ== Content-Language: en-us X-Originating-IP: [10.14.152.43] X-OriginalArrivalTime: 25 Aug 2011 21:27:17.0817 (UTC) FILETIME=[C4055290:01CC636D] Cc: devin.teske@fisglobal.com Subject: Broadcom BCM5780 Link-UP before auto-negotiation completes X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Aug 2011 21:27:19 -0000 Hi All, I've got three different workstations each with a Broadcom Gigabit Ethernet card (slightly different models, but all running the bge(4) device driver) on FreeBSD-8.1 RELEASE. We've having a strange problem where each/every single reboot ends up in dropping to single-user mode because the NFS mounts fail in-turn because the bge0 interface claims to be up but hasn't completed auto-negotation of the link-speed yet (and states "no carrier"). After being dropped to single-user mode, you can press ENTER to accept the default shell of /bin/sh and then type ^D to exit -- machine continues booting just fine. I've tried back-porting the recent changes from bge(4) in the RELENG_8_2_0_RELEASE branch and even the RELENG_8 branch to no avail. I was really disappointed because I could have sworn that one of these two SVN revs (both published for RELENG_8_2_0_RELEASE) would have fixed the problem: http://svnweb.freebsd.org/base?view=revision&revision=213808 Add more checks for resolved link speed in bge_miibus_statchg(). Link UP state could be reported first before actual completion of auto-negotiation. This change makes bge(4) reprogram BGE_MAC_MODE, BGE_TX_MODE and BGE_RX_MODE register only after controller got a valid link. http://svnweb.freebsd.org/base?view=revision&revision=213711 The IFF_DRV_RUNNING flag is set at the end of bge_init_locked. But before setting the flag, interrupt was already enabled such that interrupt handler could be run before setting IFF_DRV_RUNNING flag. This can lose initial link state change interrupt which in turn make bge(4) think that it still does not have valid link. Fix this race by protecting the taskqueue with a driver lock. While I'm here move reenabling interrupt code after handling of link state change. I'm afraid that our next recourse is going to be (in order of preference): 1. Try back-porting from an even further target (HEAD -> RELENG_8_1_0_RELEASE; RELENG_8 wasn't high enough and bug still occurred). 2. Try firmware upgrade of the Broadcom controller 3. Write a custom rc.d script to detect when bge(4) is in use and sleep for a few seconds before proceeding to NFS mounts And if none of those work... 4. Unceremoniously rip bge(4) from our kernels to prevent usage in production -- requiring the installation of a PCI or PCI-e or PCI-X network card that doesn't suffer this issue. Suggestions welcome. -- Devin _____________ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _____________