From owner-cvs-all@FreeBSD.ORG Tue Dec 19 05:24:54 2006 Return-Path: X-Original-To: cvs-all@freebsd.org Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B130216A412; Tue, 19 Dec 2006 05:24:54 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1AF0543C9F; Tue, 19 Dec 2006 05:24:48 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id kBJ4wOV6070270; Mon, 18 Dec 2006 21:58:33 -0700 (MST) (envelope-from scottl@samsco.org) Message-ID: <45877170.4030307@samsco.org> Date: Mon, 18 Dec 2006 23:58:24 -0500 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5 MIME-Version: 1.0 To: Bruce Evans References: <200612132051.kBDKppS4058663@repoman.freebsd.org> <200612131846.33252.jkim@FreeBSD.org> <20061214152805.D2109@besplex.bde.org> <20061216031759.N11941@delplex.bde.org> <20061218220448.S1577@epsplex.bde.org> In-Reply-To: <20061218220448.S1577@epsplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: cvs-src@freebsd.org, src-committers@freebsd.org, Scott Long , cvs-all@freebsd.org, Jung-uk Kim Subject: Re: cvs commit: src/sys/dev/bge if_bge.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Dec 2006 05:24:54 -0000 Bruce Evans wrote: > On Sat, 16 Dec 2006, I wrote: > >> On Thu, 14 Dec 2006, I wrote: >> >>> On Wed, 13 Dec 2006, Jung-uk Kim wrote: >>> >>>> On Wednesday 13 December 2006 03:51 pm, Scott Long wrote: >>>>> scottl 2006-12-13 20:51:51 UTC >>>>> >>>>> FreeBSD src repository >>>>> >>>>> Modified files: >>>>> sys/dev/bge if_bge.c >>>>> Log: >>>>> Remove a redundant write of the firmware reset magic number. It >>>>> ... >>>> I am still getting firmware handshake timeouts and/or watchdog >>>> timeouts. Most importantly it panics or get witness warnings (lots >>>> of 'memory modified after free'). Panic goes like this (while >>>> kldunload if_bge with dhclient enabled): >>>> >>>> brgphy0: detached >>>> miibus0: detached >>>> bge0: firmware handshake timed out, found 0x4b657654 >>>> bge0: firmware handshake timed out, found 0x4b657654 >>> >>> I have seen these for debugging the redundant-write problem (not for >>> detach but for bringing up the interface for the first time). My 5701 >>> just hangs if there is any redundant write (2 where the first one was >>> in bge_reset(), or 2 separate, or 2 where the second one was). My >>> 5705 survives two separate sets of 256 repeated writes; however, then >>> the firmware handshake times out; however2, everything works normally >>> after ignoring the the timeout except for printing the message. I >>> just noticed that this error wasn't ignored until recently -- I noticed >>> the return statement being removed but not that it was in a critical >>> area. >> >> The debugging code doesn't seem to have been responsible for this. >> Now, without it I almost (?) always get handshake errors on the 5705, >> but never (?) on the 5701. Apparently, the 3rd write (the one that >> was removed) was the only correctly placed one. > > Avoiding the "write_op" part of the changes fixes the handshake errors > on my non-PCIE 5705. write_op is only used to write the reset value and > one other value to BGE_MISC_CFG. bge_writemem_ind() apparently writes > the reset to nowhere, but bge_writereg() still works. > > %%% > Index: if_bge.c > =================================================================== > RCS file: /home/ncvs/src/sys/dev/bge/if_bge.c,v > retrieving revision 1.165 > diff -u -2 -r1.165 if_bge.c > --- if_bge.c 15 Dec 2006 00:27:06 -0000 1.165 > +++ if_bge.c 18 Dec 2006 10:44:05 -0000 > @@ -2544,4 +2634,7 @@ > if (sc->bge_flags & BGE_FLAG_PCIE) > write_op = bge_writemem_direct; > + /* XXX bge_writemem_ind is wrong for at least reset of 5705. */ > + else if (sc->bge_asicrev == BGE_ASICREV_BCM5705) > + write_op = bge_writereg_ind; > else > write_op = bge_writemem_ind; > %%% > > The panics might be caused by the change making the reset null. Resetting > might be much more necessary for uninitialization than for initialization. > > The bug caused the following behaviour here: > - the problem with taking a long time to start serving nfs requests (with > /usr nfs-mounted) became larger. Normally, nfs tries to start before > the interface is really up and then it takes about a minute to start. > With the bug, it often got portmap errors and sometimes never started. > - after "ifconfig down", it took a reboot to bring the interface back up. > > Bruce Ok, this looks like a result of me not understanding a bit of the linux code that I read. When doing the reset, the linux equivalent of bge_writemem_ind() is specifically avoided. I'm on vacation for the next 10 days, but I'll try to put together a patch that addresses this and other problems soon. Ping my after the first of the year otherwise. Scott