From owner-freebsd-ports@FreeBSD.ORG Tue Sep 15 03:38:41 2009 Return-Path: Delivered-To: freebsd-ports@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE15B1065670 for ; Tue, 15 Sep 2009 03:38:41 +0000 (UTC) (envelope-from lambert@lambertfam.org) Received: from sysmon.tcworks.net (sysmon.tcworks.net [65.66.76.4]) by mx1.freebsd.org (Postfix) with ESMTP id 9A3FD8FC18 for ; Tue, 15 Sep 2009 03:38:41 +0000 (UTC) Received: from sysmon.tcworks.net (localhost [127.0.0.1]) by sysmon.tcworks.net (8.13.1/8.13.1) with ESMTP id n8F38Dca098840 for ; Mon, 14 Sep 2009 22:08:13 -0500 (CDT) (envelope-from lambert@lambertfam.org) Received: (from lambert@localhost) by sysmon.tcworks.net (8.13.1/8.13.1/Submit) id n8F38DcT098839 for freebsd-ports@freebsd.org; Mon, 14 Sep 2009 22:08:13 -0500 (CDT) (envelope-from lambert@lambertfam.org) X-Authentication-Warning: sysmon.tcworks.net: lambert set sender to lambert@lambertfam.org using -f Date: Mon, 14 Sep 2009 22:08:13 -0500 From: Scott Lambert To: freebsd-ports@freebsd.org Message-ID: <20090915030813.GB66091@sysmon.tcworks.net> Mail-Followup-To: freebsd-ports@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i Subject: Nagios SIGSEGV on FreeBSD 8 X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2009 03:38:41 -0000 I've been running a FreeBSD 8-BETA2 server for DNS on a network I recently took over. No problems. We needed to get Nagios running on that network to watch all the hosts in RFC 1918 space. Taking the easy route, I just installed the Nagios 3.0.6 port on this 8-BETA2 box. Nagios runs great until someone acknowleges a down host, (adding a comment). Later, when the host comes back up, Nagios exits on a SIGSEGV. It seems to only happen when we have retention data (retention.dat) showing the host down. If we just restart Nagios without removing the retention.dat file, it SIGSEGV's the next time it tries to mark the host up. I upgraded to the nagios-devel (Nagios 3.1.2) port and we have the same problem. I'm not good with gdb, but it looks like there are two threads running. I can't tell what the other thread is doing, but the one that SEGVs seems to be trying to remove the comment associated with the acknowlegement message. sudo gdb -c /var/coredumps/nagios-52050.core /usr/local/bin/nagios GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd"...(no debugging symbols found)... Core was generated by `nagios'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.5 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libthr.so.3 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.7 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done. Loaded symbols for /libexec/ld-elf.so.1 #0 0x0807fe8b in get_next_comment_by_host () [New Thread 28326280 (LWP 100051)] [New Thread 28301140 (LWP 100222)] (gdb) bt #0 0x0807fe8b in get_next_comment_by_host () #1 0x08080940 in delete_host_acknowledgement_comments () #2 0x28331180 in ?? () #3 0x4aaac053 in ?? () #4 0x080cc394 in __JCR_LIST__ () #5 0x28342f00 in ?? () #6 0x00000000 in ?? () #7 0xbfbfe858 in ?? () #8 0x08071c15 in handle_host_state () Previous frame inner to this frame (corrupt stack?) Here is the code for get_next_comment_by_host: comment *get_next_comment_by_host(char *host_name, comment *start){ comment *temp_comment=NULL; if(host_name==NULL || comment_hashlist==NULL) return NULL; if(start==NULL) temp_comment=comment_hashlist[hashfunc(host_name,NULL,COMMENT_HASHSLOTS)]; else temp_comment=start->nexthash; for(;temp_comment && compare_hashdata(temp_comment->host_name,NULL,host_name,NULL)<0;temp_comment=temp_comment->nexthash); if(temp_comment && compare_hashdata(temp_comment->host_name,NULL,host_name,NULL)==0) return temp_comment; return NULL; } I don't grok the for loop but I'm not much of a C guy. I think they obfuscated a while loop there. I am guessing that if the hashfunc() and compare_hashdata() calls were an issue, they would show up in the backtrace? The reason I ask here, is I haven't found any reports of similar issues on the Nagios list or elsewhere on Google. I suspect the issue may have to do with threads on FreeBSD 8. I need more clue to figure out if my suspicions could be correct. I must be the first sucker to try to run Nagios on FreeBSD 8. :-) Thanks, -- Scott Lambert KC5MLE Unix SysAdmin lambert@lambertfam.org