From owner-cvs-all@FreeBSD.ORG  Tue Dec 19 05:24:54 2006
Return-Path: <owner-cvs-all@FreeBSD.ORG>
X-Original-To: cvs-all@freebsd.org
Delivered-To: cvs-all@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id B130216A412;
	Tue, 19 Dec 2006 05:24:54 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 1AF0543C9F;
	Tue, 19 Dec 2006 05:24:48 +0000 (GMT)
	(envelope-from scottl@samsco.org)
Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57])
	(authenticated bits=0)
	by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id kBJ4wOV6070270;
	Mon, 18 Dec 2006 21:58:33 -0700 (MST)
	(envelope-from scottl@samsco.org)
Message-ID: <45877170.4030307@samsco.org>
Date: Mon, 18 Dec 2006 23:58:24 -0500
From: Scott Long <scottl@samsco.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
	rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5
MIME-Version: 1.0
To: Bruce Evans <bde@zeta.org.au>
References: <200612132051.kBDKppS4058663@repoman.freebsd.org>
	<200612131846.33252.jkim@FreeBSD.org>
	<20061214152805.D2109@besplex.bde.org>
	<20061216031759.N11941@delplex.bde.org>
	<20061218220448.S1577@epsplex.bde.org>
In-Reply-To: <20061218220448.S1577@epsplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed 
	version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org
Cc: cvs-src@freebsd.org, src-committers@freebsd.org,
	Scott Long <scottl@freebsd.org>, cvs-all@freebsd.org,
	Jung-uk Kim <jkim@freebsd.org>
Subject: Re: cvs commit: src/sys/dev/bge if_bge.c
X-BeenThere: cvs-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the entire tree <cvs-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-all>
List-Post: <mailto:cvs-all@freebsd.org>
List-Help: <mailto:cvs-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-all>,
	<mailto:cvs-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Dec 2006 05:24:54 -0000

Bruce Evans wrote:
> On Sat, 16 Dec 2006, I wrote:
> 
>> On Thu, 14 Dec 2006, I wrote:
>>
>>> On Wed, 13 Dec 2006, Jung-uk Kim wrote:
>>>
>>>> On Wednesday 13 December 2006 03:51 pm, Scott Long wrote:
>>>>> scottl      2006-12-13 20:51:51 UTC
>>>>>
>>>>>   FreeBSD src repository
>>>>>
>>>>>   Modified files:
>>>>>     sys/dev/bge          if_bge.c
>>>>>   Log:
>>>>>   Remove a redundant write of the firmware reset magic number.  It
>>>>> ...
>>>> I am still getting firmware handshake timeouts and/or watchdog
>>>> timeouts.  Most importantly it panics or get witness warnings (lots
>>>> of 'memory modified after free').  Panic goes like this (while
>>>> kldunload if_bge with dhclient enabled):
>>>>
>>>> brgphy0: detached
>>>> miibus0: detached
>>>> bge0: firmware handshake timed out, found 0x4b657654
>>>> bge0: firmware handshake timed out, found 0x4b657654
>>>
>>> I have seen these for debugging the redundant-write problem (not for
>>> detach but for bringing up the interface for the first time).  My 5701
>>> just hangs if there is any redundant write (2 where the first one was
>>> in bge_reset(), or 2 separate, or 2 where the second one was).  My
>>> 5705 survives two separate sets of 256 repeated writes; however, then
>>> the firmware handshake times out; however2, everything works normally
>>> after ignoring the the timeout except for printing the message.  I
>>> just noticed that this error wasn't ignored until recently -- I noticed
>>> the return statement being removed but not that it was in a critical
>>> area.
>>
>> The debugging code doesn't seem to have been responsible for this.
>> Now, without it I almost (?) always get handshake errors on the 5705,
>> but never (?) on the 5701.  Apparently, the 3rd write (the one that
>> was removed) was the only correctly placed one.
> 
> Avoiding the "write_op" part of the changes fixes the handshake errors
> on my non-PCIE 5705.  write_op is only used to write the reset value and
> one other value to BGE_MISC_CFG.  bge_writemem_ind() apparently writes
> the reset to nowhere, but bge_writereg() still works.
> 
> %%%
> Index: if_bge.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v
> retrieving revision 1.165
> diff -u -2 -r1.165 if_bge.c
> --- if_bge.c    15 Dec 2006 00:27:06 -0000    1.165
> +++ if_bge.c    18 Dec 2006 10:44:05 -0000
> @@ -2544,4 +2634,7 @@
>          if (sc->bge_flags & BGE_FLAG_PCIE)
>              write_op = bge_writemem_direct;
> +        /* XXX bge_writemem_ind is wrong for at least reset of 5705. */
> +        else if (sc->bge_asicrev == BGE_ASICREV_BCM5705)
> +            write_op = bge_writereg_ind;
>          else
>              write_op = bge_writemem_ind;
> %%%
> 
> The panics might be caused by the change making the reset null.  Resetting
> might be much more necessary for uninitialization than for initialization.
> 
> The bug caused the following behaviour here:
> - the problem with taking a long time to start serving nfs requests (with
>   /usr nfs-mounted) became larger.  Normally, nfs tries to start before
>   the interface is really up and then it takes about a minute to start.
>   With the bug, it often got portmap errors and sometimes never started.
> - after "ifconfig down", it took a reboot to bring the interface back up.
> 
> Bruce

Ok, this looks like a result of me not understanding a bit of the linux 
code that I read.  When doing the reset, the linux equivalent of 
bge_writemem_ind() is specifically avoided.

I'm on vacation for the next 10 days, but I'll try to put together a
patch that addresses this and other problems soon.  Ping my after the
first of the year otherwise.

Scott