Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Mar 2010 11:36:28 -0700
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        Attila Nagy <bra@fsn.hu>
Cc:        Mailing List FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
Message-ID:  <20100325183628.GD1278@michelle.cdnetworks.com>
In-Reply-To: <4BAB718C.3090001@fsn.hu>
References:  <4BAB718C.3090001@fsn.hu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 25, 2010 at 03:22:04PM +0100, Attila Nagy wrote:
> Hi,
> 
> I have some recursive nameservers, running unbound and 7.2-STABLE #0: 
> Wed Sep  2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce 
> interfaces).
> These work OK.
> 
> During the process of migrating to 8.x, I've upgraded one of these 
> machines to 8.0-STABLE #25: Tue Mar  9 18:15:34 CET 2010 (the dates 
> indicate an approximate time, when the source was checked out from 
> cvsup.hu.freebsd.org, I don't know the exact revision).
> 
> The first problem was that the machine occasionally lost network access 
> for some minutes. I could log in on the console, and I could see the 
> processes, involved in network IO in "keglim" state, but couldn't do any 
> network IO. This lasted for some minutes, then everything came back to 
> normal.
> I could fix this issue by raising kern.ipc.nmbclusters to 51200 
> (doubling from its default size), when I can't see these blackouts.
> 
> But now the machine freezes. It can run for about a day, and then it 
> just freezes. I can't even break in to the debugger with sending NMI to it.
> top says:
> last pid: 92428;  load averages:  0.49,  0.40,  0.38    up 0+21:13:18  
> 07:41:43
> 43 processes:  2 running, 38 sleeping, 1 zombie, 2 lock
> CPU:  1.3% user,  0.0% nice,  1.3% system, 26.0% interrupt, 71.3% idle
> Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free
> Swap:
> 
>   PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> 45011 bind         4  49    0  1734M  1722M RUN     2  37:42 22.17% unbound
>   712 bind         3  44    0 70892K 19904K uwait   0  71:07  3.86% 
> python2.6
> 
> The common in these freezes seems to be the high interrupt count. 
> Normally, during load the CPU times look like this:
> CPU:  3.5% user,  0.0% nice,  1.8% system,  0.4% interrupt, 94.4% idle
> 
> I could observe a "freeze", where top remained running and everything 
> was 0%, except interrupt, which was 25% exactly (the machine has four 
> cores), and another, where I could save the following console output:
> CPU:  0.0% user,  0.0% nice,  0.2% system, 50.0% interrupt, 49.8% idle

When you see high number of interrupts, could you check this comes
from bce(4)? I guess you can use systat(1) to check how many number
interrupts are generated from bce(4).

> .......(partial, broken line)....32M  2423M *udp    1  50:16 10.89% unbound
>   714 bind         3  44    0 70892K 26852K uwait   3   8:41  4.69% 
> python2.6
> 61004 root         1  62    0 37428K 10876K *udp    1   0:00  1.56% python
>   706 root         1  44    0  2696K   624K piperd  1   0:07  0.00% 
> readproctit
> 
> Both unbound and python accepts DNS requests, and it seems when 25% 
> interrupt happens, only unbound is in *udp state, where it is 50%, both 
> programs are in that state.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100325183628.GD1278>