Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jun 2002 11:00:57 +0200
From:      Holger Kipp <holger.kipp@alogis.com>
To:        frank@exit.com
Cc:        pjklist@ekahuna.com, stable@FreeBSD.ORG
Subject:   Re: Status of fxp / smp problem?
Message-ID:  <3D12EB49.3E3CC0D5@alogis.com>
References:  <200206202151.g5KLpXJ9065056@realtime.exit.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Frank Mayhar wrote:
> 
> Philip J. Koenig wrote:
> > Over the past couple/few weeks there were lots of reports of systems
> > which had trouble with the fxp (Intel Pro 10/100 NIC) drivers,
> > particularly on SMP systems.

Not only sym/fxp, but also sym/ata, only sym, or only fxp or even others.

> > Are people still having these problems with 4.6-RELEASE or RELENG4?
> > I've been waiting for an indication this has been fixed, as I've got
> > a couple of boxes here waiting to be installed, that I wanted to
> > update first - but not if that problem was still there, as they're
> > both SMP boxes that use the affected Intel NICs.
> 
> I don't think the NIC really matters, as I've seen it even without an
> fxp.  I managed to alleviate the problem somewhat by killing the dnetc
> processes.  It appears that pegging CPUs makes the problem much worse (I'm
> not sure whether pegging one is sufficient or if both should be pegged;
> based on my experience, though, I strongly suspect the former).

There is a workaround available for the sym drivers (I'm not sure if it is
already committed), which checks for stalled IRQs, forcing driver service
if IRQ is set but obviously not cleared for a longer period of time.

Problem seems to manifest itself especially on systems with:
  - shared IRQs  AND
  - SMP enabled

I don't know enough about IRQ handling, but I'd say this should be tracked
down - might be a hardware problem in some cases, but could also be some
quirk in IRQ handling code, not necessarily a problem with the specific
drivers (apart from timing issues, maybe).

> I've still had no good suggestions as to what to examine.  I looked at
> the low-level code, but there were no obvious smoking guns there.  A few
> commits in the relevant time period, but none that seemed likely to cause
> interrupt problems.

I have once seen a problem report with a high-end 4-processor system, but 
there I couldn't find any shared irqs...

I intend to write up a summary this weekend, so maybe that might help a bit.

Regards,
Holger


-- 
Holger Kipp, Dipl.-Math., Systemadministrator  | alogis AG
Fon: +49 (0)30 / 43 65 8 - 114                 | Berliner Strasse 26
Fax: +49 (0)30 / 43 65 8 - 214                 | D-13507 Berlin Tegel
email: holger.kipp@alogis.com                  | http://www.alogis.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D12EB49.3E3CC0D5>