Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jun 2002 12:10:51 +0200
From:      Holger Kipp <holger.kipp@alogis.com>
To:        Pete French <pfrench@firstcallgroup.co.uk>
Cc:        frank@exit.com, pjklist@ekahuna.com, stable@FreeBSD.ORG
Subject:   Re: Status of fxp / smp problem?
Message-ID:  <3D12FBAB.8C676DA9@alogis.com>
References:  <E17LKoD-0001RG-00@mailhost.firstcallgroup.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Pete French wrote:
> 
> > Not only sym/fxp, but also sym/ata, only sym, or only fxp or even others.
> 
> Did we ever track down a time window as to *when* the changes were made
> thats caused this to start happening ? For me it was the update on May 22
> that started it all going wrong, but I cant (unfortunately) remember what
> date of -STABLE the machine was running up until that point.

Hmm - I could produce similar errors with 4.5-RELEASE (not exactly the same
behaviour, the fxp behaved more sluggish, so to speak, so I didn't get
system hangs as abruptly). As a guess, code changes (improvements) that
change the timing noticeably might lead to these problems coming to the
surface more often.

> > Problem seems to manifest itself especially on systems with:
> >   - shared IRQs  AND
> >   - SMP enabled
> 
> One added thing here - I had shared IRQ's between ata and sym, and the
> problem went away when I took the ata driver out of the kerenl. *but* I
> do not have any devices attached to the atat controller, so (preseumably)
> it could not have actually been interrupting ?

You have two drivers who have to react to the same IRQ, so maybe its some
sort of race condition... But thats more for developers, who know their
IRQs by heart <grin>.

> Speculation: preseumably wth a shared IRQ the system scans devices it
> knows are attached to that IRQ until it finds one which needs service ? Any
> ideas what order it will do this in - i.e. would it be possible for it
> to scan ata, followed by sym, and for there to be some oddity in the IRQ
> code that stops it continuing on to scan sym under certain circumstances ?
> Unsure as to how this might happen, and I havent looked at the IRQ code,
> but I do have a machine on which I can reproduce the problem 100% reliably
> if that helps.

Wish I had the time. *Sigh* Or is there a IRQ debugging switch somewhere around
within the system? That reminds me of an old assembler problem I once had (6502),
where I couldn't debug the problem with debug statements, as they changed the 
timing such that the bug didn't occur...

> PS: committing that sym workaround would be really nice as I could at least
>     then use our Compaq multiprocessor machines reliably.

Hmm, looks like Gérard didn't have the time to polish his code yet and commit
it. I'd suggest we give him some more time before we complain, as he also
has a living ;-)

-- 
Holger Kipp, Dipl.-Math., Systemadministrator  | alogis AG
Fon: +49 (0)30 / 43 65 8 - 114                 | Berliner Strasse 26
Fax: +49 (0)30 / 43 65 8 - 214                 | D-13507 Berlin Tegel
email: holger.kipp@alogis.com                  | http://www.alogis.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D12FBAB.8C676DA9>