Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Mar 2010 21:21:42 +0200
From:      Attila Nagy <bra@fsn.hu>
To:        pyunyh@gmail.com
Cc:        Mailing List FreeBSD Stable <freebsd-stable@freebsd.org>, Michael Loftis <mloftis@wgops.com>
Subject:   Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
Message-ID:  <4BB0FDC6.7050105@fsn.hu>
In-Reply-To: <20100329183848.GE1473@michelle.cdnetworks.com>
References:  <4BAB718C.3090001@fsn.hu> <886B21E1787F0003B89E34B6@[192.168.1.44]>	<4BB087B7.3030602@fsn.hu> <20100329183848.GE1473@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>   
>> Hi,
>>
>> Michael Loftis wrote:
>>     
>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <bra@fsn.hu>
>>> wrote:
>>>
>>> <...>
>>>       
>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>> programs are in that state.
>>>>         
>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>> chipset?  ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>> under high load.  We're pretty sure it's mostly the nfe driver, or the
>>> chips themselves, but have never ruled out some generic 8.x hardware
>>> offload issues.
>>>       
>> Bingo, this solved the problem. The current uptime nears four days.
>> Previously I couldn't go further than a day.
>>
>> The machine gets very light TCP load (and other machines which get work
>> well), so I guess it's UDP RX or TX checksum related.
>>
>>     
>
> Hmm, this is unexpected result. Since you're using UDP, TSO is not
> involved in this issue. Because you disabled RX/TX checksum
> offloading could you check how many number of 'bad checksum' and
> and 'no checksum' you have from netstat(1)?
> To narrow down which side of checksum offloading causes the issue,
> would you just disable one side in a time? For instance, disable TX
> checksum offloading with RX checksum offloading enabled and see how
> bce(4) works.
> #ifconfig bce0 -txcsum rxcsum
> If that shows the same issue, try disabling RX checksum offloading
> but enabling TX checksum offloading.
> #ifconfig bce0 txcsum -rxcsum
>   
It's interesting. During the day, I've disabled only HW checksumming and
left TSO enabled. It couldn't run more than a few hours.
I have disabled tso again to see what happens.

BTW, of course there is TCP traffic on that interface (DNS is also
available on TCP), maybe this causes the problem.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BB0FDC6.7050105>