Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Oct 2005 09:05:43 +0200
From:      Dan Bilik <dan@mail.neosystem.cz>
To:        freebsd-current@freebsd.org
Cc:        Bill Paul <wpaul@FreeBSD.ORG>
Subject:   Re: Possible fxp(4) problem in -CURRENT
Message-ID:  <20051019090543.244d6603.dan@mail.neosystem.cz>
In-Reply-To: <20051018203012.2186B16A422@hub.freebsd.org>
References:  <20051018215950.7defb35e.dan@mail.neosystem.cz> <20051018203012.2186B16A422@hub.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 18 Oct 2005 20:30:12 +0000 (GMT)
wpaul@FreeBSD.ORG (Bill Paul) wrote:

>> Today one of the problem machines got stuck again. I was able to
>> log on through second functional interface and watch it more
>> closely. Sending packets from the box worked (its arp requests were
>> appearing on other boxes in the subnet) but it could not receive
>> any packet. And another thing... It seems that running tcpdump (ie.
>> entering and leaving promiscuous mode) on the interface resolved
>> the problem and made the machine to appear back on the network.
>> It's running with no problem from that moment.
> ...
> - The chip has experienced an RX overrun, where all of the descriptors
>   in its RX DMA ring have been filled by the chip before the driver
>   has had a chance to drain them. When this happens, the chip may
>   require the RX unit to be resumed.
> - For some reason, the RX handler code in the driver has fallen out
>   of sync with the chip, i.e. the current descriptor index has gotten
>   clobbered, or maybe the chip was restarted and the index wasn't
> properly reset.
> RX overruns are obviously the result of a very busy network (or a very
> busy host processor that can't service the NIC frequently enough to
> drain the RX ring). If the network is busy, it would be with a lot of
> small packets.

Yes, it's exactly that case. The box is running boa to serve http
requests for static content (mostly small to medium size images). There
are around 1k established short-time connections and 50-70% CPU usage
for the most of the day. We have also tried polling(4) on the problem
machines but it didn't help (though we got less CPU usage).

The same hardware serving the same purposes but running 4.9-RELEASE has
never got jammed that way. It runs for months without a problem.

> You should run vmstat -i or something to monitor the interrupt rate
> on the failing interface and see if it peaks right before it goes
> deaf.

OK, I'm going to periodically collect this information on the problem
boxes. Thanks.

Dan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051019090543.244d6603.dan>