Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 Feb 2007 20:11:11 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Oleg Bulyzhin <oleg@freebsd.org>
Cc:        Robin Gruyters <r.gruyters@yirdis.nl>, freebsd-net@freebsd.org
Subject:   Re: [Fwd: Re: bge Ierr rate increase from 5.3R -> 6.1R]
Message-ID:  <20070207193426.P35180@besplex.bde.org>
In-Reply-To: <20070206221857.GA66675@lath.rinet.ru>
References:  <20070125170532.c9c2374hkwk4oc4k@server.yirdis.net> <20070205232800.GA45487@lath.rinet.ru> <20070207003539.I31484@besplex.bde.org> <20070206221857.GA66675@lath.rinet.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 7 Feb 2007, Oleg Bulyzhin wrote:

> On Wed, Feb 07, 2007 at 01:31:39AM +1100, Bruce Evans wrote:
>> I use jdp's quicker fix.  It works fine for detecting cable unplug
>> and replug, but link detection is still very bad at boot time and
>> after down/up (seems to be worse for down/up than unplug/replug?).
>> Link detection in -current generally seems to be much worse than
>> in 5.2.  On some systems I use two ping -c2's early in the boot to
>> wait for the link to actually be up.  The first ping tends to fail
>> and the second tends to work, both long after the lonk claims to
>> be up.  Then other network activity still takes too long to start.
>> Without the pings, an "ntpdate -b" early in the boot fails about
>> half the time and gives messed up timing activity when it fails,
>> and initial nfs mounts tak 30-60 seconds.  Later after down/up and
>> waiting for the "up" message, ttcp -u usually fails to connect the
>> first time and then works normally with no failure or connection
>> delay the second time.
>
> Could you please give some more details? I'm intersted in:
> - chip version

5701 and 5705.

> - are you using 'auto' media or fixed one?

Auto.

> - have you tried verbose boot? (MAC's link state (bge_link) reported with it).

Not recently.

> - is there recipe how to trigger erroneous behaviour? i'll try it on mine
>  bge cards. (i have bcm5721 5700 & 5701, but i didnt notice errors in
>  link handling with -current driver (and i had problems with 5.x driver))

Just boot or down/up.

> The fact 'ping' workaround does help looks like lost interrupt.

Ah, it could be my returning immediately in bge_intr() if the status
block hasn't been updated.  This wouldn't notice changes to the software
link status.

>> % @@ -2699,7 +2702,7 @@
>> %
>> %  	if ((sc->bge_asicrev == BGE_ASICREV_BCM5700 &&
>> %  	    sc->bge_chipid != BGE_CHIPID_BCM5700_B2) ||
>> % -	    statusword || sc->bge_link_evt)
>> % +	    statusword || BGE_STS_BIT(sc, BGE_STS_LINK_EVT))
>> %  		bge_link_upd(sc);
>> %
>> %  	if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
>>
>> The software link status handling causes a problem in the interrupt
>> handler.  To avoid pessimizing the case of shared interrupts, the
>> interrupt handler should be able to read the status word from the
>> status block and return without doing anything if the interrupt is not
>> for it.  This can be done without acquiring the driver lock (since the
>> driver lock is neither necessary nor sufficient for accessing the
>> status block).  Software link status gets in the way of this, since
>> accessing it requires the driver lock.
> You are right, but at the moment, bge_intr() is locked and does not care
> about shared interrupts.  If we are going to fix this i would vote for
> using taskqueue api - in this case we can test 'link event' flag inside
> locked taskqueue thread.

It is fixed in mine :-).  Except for the complications with the link status.
I haven't really tested this because I try not to configure shared
interrupts, but accidentally configured an inactive rl device on the same
interrupt, so the case of an inactive rl with an active bge got tested.
This case isn't so interesting -- the bge_intr() always does something
and gets slowed down by rl_intr() deciding that it has nothing to do.
At least the old version of rl that I use does a single PCI read in
rl_intr() to decide what to do, so the slowdown is minimal.

The taskqueue would be easiest to program.  I'm not sure if it is good
for efficiency.  Certainly not while the interrupt hander is a normal
one.

>> (Note that `statusword' in the
>> above is now bogusly named.  It used to be the status word from the
>> status block but is now the mac status for a PCI register.  It is still
>> from the status block in similar code in bge_poll().  Not pessimizing
>> the case of shared interrupts requires using the status block again,
>> and then the read from the PCI register might become a dummy.)
> We can not avoid pci register read before syncing status block - we should
> flush data posted at pci bridge, otherwise we can 'miss' interrupt.

Actually, that might be avoidable too -- handle any missed interrupts by
polling.  If interrupts are rarely missed then the extra latency for
polling wouldn't matter.

BTW, avoiding the PCI read in bge_start_locked() (rev.1.108) gives a
remarkably large difference in performance for small packets.  Rev.1.108
claims a 1.8% speedup, but I got over 50% (370 kpps -> 560 kpps) when
I applied 1.108 to an old version.  The PCI read was adding about 60%
to (the already very large) per-packet CPU overheads for small packets,
since for small packets the CPU can't keep up, so tx queue lengths are
always 0 or 1 and the PCI read is never amortized across multple
packets.

PCI read in bge_intr() is only expensive in similar cases -- when there
are only a few packets per interrupt.

>> Elsewhere, you added code to force interrupt after link status changes.
>> I think this is to get the interrupt handler to look at the software
>> link status in the above.  When I first saw it, I thought that forcing
>> an interrupt is needed in more places.
> Forced interrupt used for TBI cards since they do not have PHY auto-polling.

Do they set BGE_STATFLAG_UPDATED?

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070207193426.P35180>