Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 May 2002 22:45:07 +0200
From:      Holger Kipp <holger.kipp@alogis.com>
To:        frank@exit.com
Cc:        Pete French <pfrench@firstcallgroup.co.uk>, stable@FreeBSD.ORG, Maildrop <maildrop@qwest.net>
Subject:   Re: 4.6-RC system hangs (fxp0, smp, sym)
Message-ID:  <3CE96053.23367D00@alogis.com>
References:  <200205201600.g4KG0gdt056186@realtime.exit.com>

next in thread | previous in thread | raw e-mail | index | archive | help

(Answers to individual mails below)
Here the current state on my side:

System hangs after some traffic on a 10Mbit link. Hang can be triggered
by running 'ping -f' against a stupid Win98-System. Hang occurs after
50.000 to 1.400.000 packets. I'll have to investigate further to see if
this depends on other system activity...

SMP Buildworld is very stable, so no bad memory or anything.

Only happens with SMP, otherwise system is rock stable even in 
combination with heavy IO, buildworld, ping -f etc.

Happens with 4.5-RELEASE and 4.6-RC (as of 19.05.2002, 15:00), though
TCP/IP seems more stable on 4.6-RC (and the hang is more 'solid' ;-)

With SYM 53C875 it is also possible to use NCR instead of SYM.
Hangs with both. Differences (tested on 4.6-RC, SMP):
SYM: after hang, taking interface down does not unhang the system
     immediately, even though ping at once replies, that the network
     is down. It takes a minute or two, till I get
     "(noperiph:sym0:0:-1:-1): SCSI BUS reset detected"
     and the system is responsive again.
NCR: taking down interface is immediately effective. No additional
     messages.

Message "fxp0:device timeout" with both SYM and NCR.

Apart from that, I don't get anything else, not even a 'Buffer is full'-
error, if I try to ping other systems during system hang.

Looks a bit like a deadlock between some NIC-resources (maybe fxp-
related) and Symbios-SCSI-resources, but I'm no kernel-hacker.
The Toshiba Magnia 3000 (128M, 2x350/512k) is a pure test server, so if
I can help out with debugging or testing whatsoever, please let me know.

Regards,
Holger



Maildrop wrote:
> to get the device back, I did `ifconfig fxp0 down; ifconfig fxp0 up`.
> Not really a long term fix, but was able to get back into the system
> that way.

Only works if 'ifconfig' is already in memory, as disk access is not
possible during hang...

> Try a ping from that network device when it is a hung state, I bet
> you get a "Buffer is full" error.

No, I get exactly nothing.

Frank Mayhar wrote:
> Pete French wrote:
> > Interesting. I see exactly the same behaviour on a Compaq Proliant
> > server aas of the lastest SUP. The problem is only exhibited under
> > SMP - UP appears to work fine. I dont have fxp cards though, they
> > are all tl's
> >
> > Does this look familiar ?
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=37043

After copying 141 MB from one disk to another several times (with NCR,
not SYM, I have to admit), I still got no error with disk IO. Have you
tried using NCR instead? I don't know if your chipset is supported by
NCR device driver, though.

> Fascinating.  Same problem here, only with a Tyan Thunder 2500
> (Serverworks chipset).  Symbios 53C896 SCSI, fxp0 ethernet.
> Hangs, but no messages, no bus resets, nothing.  I do occasionally
> see "psmintr: out of sync" messages,
> but I think that's just a symptom of the hang, not a cause.

Bus reset only happens after I take the offending interface down
with 'ifconfig fxp0 down'. Using SYM driver, this might take a minute
or two. With NCR, the system unhangs almost instantly...

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CE96053.23367D00>