Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Dec 2006 22:40:55 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Jung-uk Kim <jkim@freebsd.org>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, Scott Long <scottl@freebsd.org>, cvs-all@freebsd.org
Subject:   Re: cvs commit: src/sys/dev/bge if_bge.c
Message-ID:  <20061218220448.S1577@epsplex.bde.org>
In-Reply-To: <20061216031759.N11941@delplex.bde.org>
References:  <200612132051.kBDKppS4058663@repoman.freebsd.org> <200612131846.33252.jkim@FreeBSD.org> <20061214152805.D2109@besplex.bde.org> <20061216031759.N11941@delplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 16 Dec 2006, I wrote:

> On Thu, 14 Dec 2006, I wrote:
>
>> On Wed, 13 Dec 2006, Jung-uk Kim wrote:
>> 
>>> On Wednesday 13 December 2006 03:51 pm, Scott Long wrote:
>>>> scottl      2006-12-13 20:51:51 UTC
>>>> 
>>>>   FreeBSD src repository
>>>> 
>>>>   Modified files:
>>>>     sys/dev/bge          if_bge.c
>>>>   Log:
>>>>   Remove a redundant write of the firmware reset magic number.  It
>>>> ...
>>> I am still getting firmware handshake timeouts and/or watchdog
>>> timeouts.  Most importantly it panics or get witness warnings (lots
>>> of 'memory modified after free').  Panic goes like this (while
>>> kldunload if_bge with dhclient enabled):
>>> 
>>> brgphy0: detached
>>> miibus0: detached
>>> bge0: firmware handshake timed out, found 0x4b657654
>>> bge0: firmware handshake timed out, found 0x4b657654
>> 
>> I have seen these for debugging the redundant-write problem (not for
>> detach but for bringing up the interface for the first time).  My 5701
>> just hangs if there is any redundant write (2 where the first one was
>> in bge_reset(), or 2 separate, or 2 where the second one was).  My
>> 5705 survives two separate sets of 256 repeated writes; however, then
>> the firmware handshake times out; however2, everything works normally
>> after ignoring the the timeout except for printing the message.  I
>> just noticed that this error wasn't ignored until recently -- I noticed
>> the return statement being removed but not that it was in a critical
>> area.
>
> The debugging code doesn't seem to have been responsible for this.
> Now, without it I almost (?) always get handshake errors on the 5705,
> but never (?) on the 5701.  Apparently, the 3rd write (the one that
> was removed) was the only correctly placed one.

Avoiding the "write_op" part of the changes fixes the handshake errors
on my non-PCIE 5705.  write_op is only used to write the reset value and
one other value to BGE_MISC_CFG.  bge_writemem_ind() apparently writes
the reset to nowhere, but bge_writereg() still works.

%%%
Index: if_bge.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
retrieving revision 1.165
diff -u -2 -r1.165 if_bge.c
--- if_bge.c	15 Dec 2006 00:27:06 -0000	1.165
+++ if_bge.c	18 Dec 2006 10:44:05 -0000
@@ -2544,4 +2634,7 @@
  		if (sc->bge_flags & BGE_FLAG_PCIE)
  			write_op = bge_writemem_direct;
+		/* XXX bge_writemem_ind is wrong for at least reset of 5705. */
+		else if (sc->bge_asicrev == BGE_ASICREV_BCM5705)
+			write_op = bge_writereg_ind;
  		else
  			write_op = bge_writemem_ind;
%%%

The panics might be caused by the change making the reset null.  Resetting
might be much more necessary for uninitialization than for initialization.

The bug caused the following behaviour here:
- the problem with taking a long time to start serving nfs requests (with
   /usr nfs-mounted) became larger.  Normally, nfs tries to start before
   the interface is really up and then it takes about a minute to start.
   With the bug, it often got portmap errors and sometimes never started.
- after "ifconfig down", it took a reboot to bring the interface back up.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061218220448.S1577>