Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Dec 2006 23:58:24 -0500
From:      Scott Long <scottl@samsco.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        cvs-src@freebsd.org, src-committers@freebsd.org, Scott Long <scottl@freebsd.org>, cvs-all@freebsd.org, Jung-uk Kim <jkim@freebsd.org>
Subject:   Re: cvs commit: src/sys/dev/bge if_bge.c
Message-ID:  <45877170.4030307@samsco.org>
In-Reply-To: <20061218220448.S1577@epsplex.bde.org>
References:  <200612132051.kBDKppS4058663@repoman.freebsd.org> <200612131846.33252.jkim@FreeBSD.org> <20061214152805.D2109@besplex.bde.org> <20061216031759.N11941@delplex.bde.org> <20061218220448.S1577@epsplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> On Sat, 16 Dec 2006, I wrote:
> 
>> On Thu, 14 Dec 2006, I wrote:
>>
>>> On Wed, 13 Dec 2006, Jung-uk Kim wrote:
>>>
>>>> On Wednesday 13 December 2006 03:51 pm, Scott Long wrote:
>>>>> scottl      2006-12-13 20:51:51 UTC
>>>>>
>>>>>   FreeBSD src repository
>>>>>
>>>>>   Modified files:
>>>>>     sys/dev/bge          if_bge.c
>>>>>   Log:
>>>>>   Remove a redundant write of the firmware reset magic number.  It
>>>>> ...
>>>> I am still getting firmware handshake timeouts and/or watchdog
>>>> timeouts.  Most importantly it panics or get witness warnings (lots
>>>> of 'memory modified after free').  Panic goes like this (while
>>>> kldunload if_bge with dhclient enabled):
>>>>
>>>> brgphy0: detached
>>>> miibus0: detached
>>>> bge0: firmware handshake timed out, found 0x4b657654
>>>> bge0: firmware handshake timed out, found 0x4b657654
>>>
>>> I have seen these for debugging the redundant-write problem (not for
>>> detach but for bringing up the interface for the first time).  My 5701
>>> just hangs if there is any redundant write (2 where the first one was
>>> in bge_reset(), or 2 separate, or 2 where the second one was).  My
>>> 5705 survives two separate sets of 256 repeated writes; however, then
>>> the firmware handshake times out; however2, everything works normally
>>> after ignoring the the timeout except for printing the message.  I
>>> just noticed that this error wasn't ignored until recently -- I noticed
>>> the return statement being removed but not that it was in a critical
>>> area.
>>
>> The debugging code doesn't seem to have been responsible for this.
>> Now, without it I almost (?) always get handshake errors on the 5705,
>> but never (?) on the 5701.  Apparently, the 3rd write (the one that
>> was removed) was the only correctly placed one.
> 
> Avoiding the "write_op" part of the changes fixes the handshake errors
> on my non-PCIE 5705.  write_op is only used to write the reset value and
> one other value to BGE_MISC_CFG.  bge_writemem_ind() apparently writes
> the reset to nowhere, but bge_writereg() still works.
> 
> %%%
> Index: if_bge.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
> retrieving revision 1.165
> diff -u -2 -r1.165 if_bge.c
> --- if_bge.c    15 Dec 2006 00:27:06 -0000    1.165
> +++ if_bge.c    18 Dec 2006 10:44:05 -0000
> @@ -2544,4 +2634,7 @@
>          if (sc->bge_flags & BGE_FLAG_PCIE)
>              write_op = bge_writemem_direct;
> +        /* XXX bge_writemem_ind is wrong for at least reset of 5705. */
> +        else if (sc->bge_asicrev == BGE_ASICREV_BCM5705)
> +            write_op = bge_writereg_ind;
>          else
>              write_op = bge_writemem_ind;
> %%%
> 
> The panics might be caused by the change making the reset null.  Resetting
> might be much more necessary for uninitialization than for initialization.
> 
> The bug caused the following behaviour here:
> - the problem with taking a long time to start serving nfs requests (with
>   /usr nfs-mounted) became larger.  Normally, nfs tries to start before
>   the interface is really up and then it takes about a minute to start.
>   With the bug, it often got portmap errors and sometimes never started.
> - after "ifconfig down", it took a reboot to bring the interface back up.
> 
> Bruce

Ok, this looks like a result of me not understanding a bit of the linux 
code that I read.  When doing the reset, the linux equivalent of 
bge_writemem_ind() is specifically avoided.

I'm on vacation for the next 10 days, but I'll try to put together a
patch that addresses this and other problems soon.  Ping my after the
first of the year otherwise.

Scott





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45877170.4030307>