Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Mar 2010 12:17:53 +0200
From:      Attila Nagy <bra@fsn.hu>
To:        pyunyh@gmail.com
Cc:        Mailing List FreeBSD Stable <freebsd-stable@freebsd.org>, Michael Loftis <mloftis@wgops.com>
Subject:   Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
Message-ID:  <4BB1CFD1.9040602@fsn.hu>
In-Reply-To: <20100329194131.GG1473@michelle.cdnetworks.com>
References:  <4BAB718C.3090001@fsn.hu> <886B21E1787F0003B89E34B6@[192.168.1.44]>	<4BB087B7.3030602@fsn.hu>	<20100329183848.GE1473@michelle.cdnetworks.com>	<4BB0FDC6.7050105@fsn.hu> <20100329194131.GG1473@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Pyun YongHyeon wrote:
> On Mon, Mar 29, 2010 at 09:21:42PM +0200, Attila Nagy wrote:
>   
>> Pyun YongHyeon wrote:
>>     
>>> On Mon, Mar 29, 2010 at 12:57:59PM +0200, Attila Nagy wrote:
>>>   
>>>       
>>>> Hi,
>>>>
>>>> Michael Loftis wrote:
>>>>     
>>>>         
>>>>> --On Thursday, March 25, 2010 3:22 PM +0100 Attila Nagy <bra@fsn.hu>
>>>>> wrote:
>>>>>
>>>>> <...>
>>>>>       
>>>>>           
>>>>>> Both unbound and python accepts DNS requests, and it seems when 25%
>>>>>> interrupt happens, only unbound is in *udp state, where it is 50%, both
>>>>>> programs are in that state.
>>>>>>         
>>>>>>             
>>>>> Try turning of hardware TSO/checksum offload if it's availble on your
>>>>> chipset?  ifconfig <interface> -rxcsum -txcsum -tso -- I'm only using
>>>>> nfe chips right now, but w/ the TSO/CSUM on they lock up constantly
>>>>> under high load.  We're pretty sure it's mostly the nfe driver, or the
>>>>> chips themselves, but have never ruled out some generic 8.x hardware
>>>>> offload issues.
>>>>>       
>>>>>           
>>>> Bingo, this solved the problem. The current uptime nears four days.
>>>> Previously I couldn't go further than a day.
>>>>
>>>> The machine gets very light TCP load (and other machines which get work
>>>> well), so I guess it's UDP RX or TX checksum related.
>>>>
>>>>     
>>>>         
>>> Hmm, this is unexpected result. Since you're using UDP, TSO is not
>>> involved in this issue. Because you disabled RX/TX checksum
>>> offloading could you check how many number of 'bad checksum' and
>>> and 'no checksum' you have from netstat(1)?
>>> To narrow down which side of checksum offloading causes the issue,
>>> would you just disable one side in a time? For instance, disable TX
>>> checksum offloading with RX checksum offloading enabled and see how
>>> bce(4) works.
>>> #ifconfig bce0 -txcsum rxcsum
>>> If that shows the same issue, try disabling RX checksum offloading
>>> but enabling TX checksum offloading.
>>> #ifconfig bce0 txcsum -rxcsum
>>>   
>>>       
>> It's interesting. During the day, I've disabled only HW checksumming and
>> left TSO enabled. It couldn't run more than a few hours.
>> I have disabled tso again to see what happens.
>>
>> BTW, of course there is TCP traffic on that interface (DNS is also
>> available on TCP), maybe this causes the problem.
>>     
>
> The only guess I can think of at this moment is incorrect use of
> bus_dma(9) in TX path. But I'm not sure this is related with the
> issue you're seeing. Would you try the experimental patch at the
> following URL?
> http://people.freebsd.org/~yongari/bce/bce.20100305.diff
> Please make sure to back up your old bce(4) driver before applying
> the patch. I didn't see any abnormal things in testing but it
> wasn't much stressed.
>   
With the default settings (rx, tx csum, tso) it froze in about an hour:
CPU:  0.0% user,  0.0% nice,  0.0% system, 25.0% interrupt, 75.0% idle
  714 bind         4 102    0  1200M  1182M *lle    3  17:24  0.00% unbound




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BB1CFD1.9040602>