From owner-freebsd-current@FreeBSD.ORG Thu Oct 8 13:08:39 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76DE5106568B for ; Thu, 8 Oct 2009 13:08:39 +0000 (UTC) (envelope-from ianf@clue.co.za) Received: from inbound01.jnb1.gp-online.net (inbound01.jnb1.gp-online.net [41.161.16.135]) by mx1.freebsd.org (Postfix) with ESMTP id 0D1CC8FC1F for ; Thu, 8 Oct 2009 13:08:38 +0000 (UTC) Received: from [41.154.0.10] (helo=clue.co.za) by inbound01.jnb1.gp-online.net with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.63) (envelope-from ) id 1MvsjU-0008BK-FM; Thu, 08 Oct 2009 15:08:36 +0200 Received: from localhost ([127.0.0.1] helo=clue.co.za) by clue.co.za with esmtp (Exim 4.69 (FreeBSD)) (envelope-from ) id 1MvsjS-0000mS-Uh; Thu, 08 Oct 2009 15:08:34 +0200 To: Tim Kientzle From: Ian FREISLICH In-Reply-To: <4ABAF113.9030904@freebsd.org> References: <4ABAF113.9030904@freebsd.org> <20090922212905.GA77503@sysmon.tcworks.net> X-Attribution: BOFH Date: Thu, 08 Oct 2009 15:08:34 +0200 Message-Id: Cc: freebsd-current@freebsd.org Subject: Re: Nagios SIGSEGV on FreeBSD 8 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Oct 2009 13:08:39 -0000 Tim Kientzle wrote: > Scott Lambert wrote: > > I've posted this to FreeBSD-ports and Nagios-Users without a nibble. > > > > [New Thread 28326280 (LWP 100051)] > > [New Thread 28301140 (LWP 100222)] > > (gdb) bt > > #0 0x0807fe8b in get_next_comment_by_host () > > #1 0x08080940 in delete_host_acknowledgement_comments () > > #2 0x28331180 in ?? () > > #3 0x4aaac053 in ?? () > > #4 0x080cc394 in __JCR_LIST__ () > > Build with debug symbols and try again; maybe you can get > more detail. Also, check a couple of core dumps to > see if it's crashing in the same place; that might > also give a clue. > > Do the "New Thread" messages mean that Nagios is running > multiple threads? If so, I wonder what the other > thread is doing? > We've been trying to combat a performance issue in Nagios. One thread handles incoming events (nsca etc) and data from the nagios.cmd pipe file and writes files for processing in /var/spool/nagios/checkresults. The other thread processes these files and updates the host state and other data. The threading broke profiling (I think) because when Nagios was compiled with -pg it did no more than read its configuration, but this alone was a pointer to the area that Nagios is poorly optimised - string processing. Reading our configuration resulted in 65000000 calls to strcmp. 65 Million!! We're battling to keep up with passive events from about 5000 hosts every few minutes. The nagios.cmd thread struggles to keep up reading from the fifo when there a about 4000 writers. And the worker thread struggles parse the checkresults files - they're big, but not *that* big, maybe 80k to 120k lines which it takes about 7 minutes to parse. We also had to up kern.maxusers="1024" and kern.ipc.nmbclusters="131072" to prevent the system starving network resources. Ian -- Ian Freislich