Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Jan 2007 18:19:30 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Sven Willenberger <sven@dmv.com>
Cc:        stable@FreeBSD.org, freebsd-amd64@FreeBSD.org, John Baldwin <jhb@FreeBSD.org>
Subject:   Re: Panic in 6.2-PRERELEASE with bge on amd64
Message-ID:  <20070113172849.E94785@delplex.bde.org>
In-Reply-To: <45A54FC9.8040900@dmv.com>
References:  <1168211205.22629.6.camel@lanshark.dmv.com> <20070109124826.M79616@delplex.bde.org> <1168353425.29047.8.camel@lanshark.dmv.com> <200701091150.15274.jhb@freebsd.org> <20070110132839.X16378@besplex.bde.org> <45A54FC9.8040900@dmv.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 10 Jan 2007, Sven Willenberger wrote:

> Bruce Evans presumably uttered the following on 01/09/07 21:42:
>> Also look at nearby chain entries (especially at (rxidx - 1) mod 512)).
>> I think the previous 255 entries and the rxidx one should be
>> non-NULL since we should have refilled them as we used them (so the
>> one at rxidx is least interesting since we certainly just refilled
>> it), and the next 256 entries should be NULL since we bogusly only use
>> half of the entries.  If the problem is uninitialization, then I expect
>> all 512 entries except the one just refilled at rxidx to be NULL.

> (kgdb) p sc->bge_cdata.bge_rx_std_chain[rxidx]
> $1 = (struct mbuf *) 0xffffff0097a27900
> (kgdb) p rxidx
> $2 = 499
>
> since rxidx = 499, I assume you are most interested in 498:
> (kgdb) p sc->bge_cdata.bge_rx_std_chain[498]
> $3 = (struct mbuf *) 0xffffff00cf1b3100
>
> for the sake of argument, 500 is null:
> (kgdb) p sc->bge_cdata.bge_rx_std_chain[500]
> $13 = (struct mbuf *) 0x0
>
> the indexes with values basically are 243 through 499:
> (kgdb) p sc->bge_cdata.bge_rx_std_chain[241]
> $30 = (struct mbuf *) 0x0
> (kgdb) p sc->bge_cdata.bge_rx_std_chain[242]
> $31 = (struct mbuf *) 0x0
> (kgdb) p sc->bge_cdata.bge_rx_std_chain[243]
> $32 = (struct mbuf *) 0xffffff005d4ab700
> (kgdb) p sc->bge_cdata.bge_rx_std_chain[244]
> $33 = (struct mbuf *) 0xffffff004f644b00
>
> so it does not seem to be a problem with "uninitialization".

There are supposed to be only 256 nonzero entries (except briefly while
one is being refreshed), but the above indicates that there 257: #243
through #499 gives 257 nonzero entries.  Everything indicates that
entry #499 was null before it was refreshed, and that the loop in
bge_rxeof() is trying to process a descriptor 1 after the last valid
(previously handled) descriptor.  I cannot see why it might do this.
The next step might be to add active debugging code:
- check that m != NULL when m is taken off the rx chain (before refresshing
   its entry), and panic if it is.
- check that there are always BGE_SSLOTS (256) nonzero mbufs in the std
   rx chain.  It would be interesting to know if they are always contiguous.
   They might not be since this depends on how the hardware uses them.
   Debugging is simpler if they are.
- check that bge_rxeof() is not reentered.
- check the rx producer index and related data before and after getting
   a null m.  It can easily change while bge_rxeof() is running, so
   recording its value before and after might be useful.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070113172849.E94785>