Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 May 2002 11:12:35 +0200 (CEST)
From:      remy@boostworks.com
To:        paul@nerdlabs.com
Cc:        jdp@polstra.com, stable@FreeBSD.ORG, kendall@jedis.com
Subject:   Re: fxp0: SCB timeout
Message-ID:  <200205070916.g479GKs06066@luxren2.boostworks.com>
In-Reply-To: <200205062255.48215.paul@nerdlabs.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On  6 May, Paul Dlug wrote:
> 
> If the general consensus is that adding a PCI NIC and disabling the
> onboard will fix it, I'll go that route. But I'd rather try to
> understand what the actual problem is and hopefully get a fix for it.
> Does anyone have suggestions for further debugging?
> 

I spent around a week, a month ago, trying to understand the problem. If
it may help, here is the tale:

The machine is an Intel SCB2 (SMP, ServerWorks HE-SL based) with a 64
Bits/66Mhz PCI NIC using a 21154BC transparent PCI bridge and two 82559
A0 revision chips.

Symptom, under heavy load only, were:

- 82559 stopping sending (with packets ready to go in ring)
- 82559 sending storms of Pause FC frames
- Spurious 'SCB timeout' (82559 stopped)
- Machine lock

All this in various combination. In case of machine lock, IPMI records
were indicating PCI protocol violation and then processors internal
error.

I tried various BIOS updates. No hope.

I instrumented the fxp driver to monitor the 82559 internal states and
found some strange transitions (Idle to Active to Stopped) and tested
various hacks for resetting and restarting the chip like Intel does in
its driver for the A0 revision. Nothing worked properly. I only found
that refraining statistics readings from the chip render the chip a
little bit more reliable.

I also tried various PCI bridge setting for back-to-back transactions,
timeout, errors forwarding, etc... with (mainly) worst results.

I had a carefull reading of the 82559 manuals, 21154 documentations,
release notes for these components, advisories to board designers,
history of chips revisions, diffs in various releases of the Linux
driver wrote at Intel. I ended up with a feeling that the really
important instrument, for the definitive answer, is a PCI protocol
analyzer.

I then got _exactly_ the same card but with a FW21154BE transparent
bridge and the same 82559 A0 revision chips. Everything worked without a
glitch.

My conclusion is that there is a big problem with the 82559 A0 revision
PCI side in combination with 66Mhz bus, some bridges and either
back-to-back transactions and/or MWI. To quote the Intel site, under PCI
bridges: (bold, red heading text)

  "The 21154AC and 21154BC transparent PCI-to-PCI Bridges are _Not
   Recommended_ for new designs. For 33 MHz applications use the
   FW21154AE. For 66 MHz applications use the FW21154BE."

Apparently, ServerWorks SL chipset (STL2) may encounter the problem with
the 82559 (6 over 10 here, more likely with additional PCI cards like
video). SCB2 (HE-SL based) uses 82550 and do not have the problem. As
for some Pro 10/100+ (low profile version) using 21154BC (Not FW21154 !)
that uses 82550 and works fine. Previous Pro 10/100B were using 21154
and 82558 with some rare problem. The Dell 1550 is an SCB2/SCSI
(prototype) version using 82559 A0 and may have the problem.

Just for the record, Intel drivers also have the SCB timeout syndrom
with some configurations and, up to now, no definitive solution exists
aside changing either the PCI bridge or the network chip.

Good luck.

RN.
IeM



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200205070916.g479GKs06066>