Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jun 2002 04:41:23 -0700
From:      "Philip J. Koenig" <pjklist@ekahuna.com>
To:        stable@FreeBSD.ORG
Cc:        frank@exit.com, Holger Kipp <holger.kipp@alogis.com>
Subject:   Re: Status of fxp / smp problem?
Message-ID:  <20020621114124174.AAA690@empty1.ekahuna.com@pc02.ekahuna.com>
In-Reply-To: <3D12EB49.3E3CC0D5@alogis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 21 Jun 2002, at 11:00, Holger Kipp boldly uttered: 

> Frank Mayhar wrote:
> > 
> > Philip J. Koenig wrote:
> > > Over the past couple/few weeks there were lots of reports of systems
> > > which had trouble with the fxp (Intel Pro 10/100 NIC) drivers,
> > > particularly on SMP systems.
> 
> Not only sym/fxp, but also sym/ata, only sym, or only fxp or even others.


One of the reasons I need to make sure stuff is working before I 
install these boxes in a remote location: all of them are SMP boxes, 
all of them have fxp hardware, all of them have ata hardware, and one 
of them has sym hardware. :-)

 
> > > Are people still having these problems with 4.6-RELEASE or RELENG4?
> > > I've been waiting for an indication this has been fixed, as I've got
> > > a couple of boxes here waiting to be installed, that I wanted to
> > > update first - but not if that problem was still there, as they're
> > > both SMP boxes that use the affected Intel NICs.
> > 
> > I don't think the NIC really matters, as I've seen it even without an
> > fxp.  I managed to alleviate the problem somewhat by killing the dnetc
> > processes.  It appears that pegging CPUs makes the problem much worse (I'm
> > not sure whether pegging one is sufficient or if both should be pegged;
> > based on my experience, though, I strongly suspect the former).
> 
> There is a workaround available for the sym drivers (I'm not sure if it is
> already committed), which checks for stalled IRQs, forcing driver service
> if IRQ is set but obviously not cleared for a longer period of time.
> 
> Problem seems to manifest itself especially on systems with:
>   - shared IRQs  AND
>   - SMP enabled
>
> I don't know enough about IRQ handling, but I'd say this should be tracked
> down - might be a hardware problem in some cases, but could also be some
> quirk in IRQ handling code, not necessarily a problem with the specific
> drivers (apart from timing issues, maybe).


One piece of info that you might consider is that SMP systems require 
the "IO APIC", ie the advanced programmable interrupt controller.  I 
am aware of 2 particular functions of the APIC: expansion of the 
traditional IRQ levels from 16 to 24, and a way of mapping a legacy 
IRQ to an APIC IRQ (ie >15), which helps avoid IRQ sharing in some 
cases.

I'm thinking that perhaps the reason this problem occurs most often 
in SMP boxes (besides the possible statistical likelihood those boxes 
may be "busier" than less sophisticated ones), is something to do 
with APIC/IRQ handling.

Also I know on the Intel SMP boards there is a BIOS selection for 
which version of SMP compatibility it uses - 1.1 or 1.4.  I always 
make sure mine are set for 1.4.  There is also a BIOS option to turn 
on/off the APIC, if I'm not mistaken.

Just from thoughts from a non-programmer.. we now return you to your 
regularly-scheduled programming... :-)

Phil


 

--
Philip J. Koenig                                       pjklist@ekahuna.com
Electric Kahuna Systems -- Computers & Communications for the New Millenium


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020621114124174.AAA690>