Date: Sun, 18 Mar 2007 23:05:10 -0400 (EDT) From: Mike Andrews <mandrews@bit0.com> To: FreeBSD-gnats-submit@FreeBSD.org Subject: ports/110498: net-snmp proc monitoring randomly fails Message-ID: <20070319030510.5020B730003@mindcrime.bit0.com> Resent-Message-ID: <200703190340.l2J3e4Ja085620@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 110498 >Category: ports >Synopsis: net-snmp proc monitoring randomly fails >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-ports-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Mar 19 03:40:04 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Mike Andrews >Release: FreeBSD 6.2-RELEASE-p2 amd64 >Organization: Fark.com LLC >Environment: System: FreeBSD mindcrime.bit0.com 6.2-RELEASE-p2 FreeBSD 6.2-RELEASE-p2 #19: Sun Mar 4 15:16:21 EST 2007 mandrews@mindcrime.bit0.com:/usr/obj/usr/src/sys/MINDCRIME amd64 >Description: With net-snmp 5.3.1 and FreeBSD 6.2-RELEASE (i386 or amd64) the "proc" monitoring facility will randomly indicate alarms that certain processes are not running (or not enough are running) when in fact they actually are. The alarms will suddenly start with no warning and then clear themselves up several hours later. If you have Nagios checking these alarms, it can be highly annoying. :) I'm fairly certain net-snmp 5.2.x and earlier don't have this problem (I've been using them for years). The problem is that net-snmp uses /bin/ps to get a list of processes and writes the output of ps to /var/net-snmp/.snmp-exec-cache. The file is truncated at 16000 bytes. This is way too small for systems with many hundreds of running processes at a time. Maybe previous versions (5.2.x and earlier) of net-snmp used something other than /bin/ps to get the process list? I don't have a procfs filesystem mounted (I did try it to see if it'd help and it didn't) >How-To-Repeat: bourbon# grep proc /usr/local/share/snmp/snmpd.conf proc syslogd 1 1 proc httpd proc ntpd 1 1 proc smartd proc clamd proc freshclam bourbon# ps -U vscan | grep clam 84154 ?? Is 0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid 84265 ?? Is 0:04.61 /usr/local/sbin/clamd bourbon# snmpwalk -v 2c -c ___ localhost .1.3.6.1.4.1.2021.2.1 UCD-SNMP-MIB::prIndex.1 = INTEGER: 1 UCD-SNMP-MIB::prIndex.2 = INTEGER: 2 UCD-SNMP-MIB::prIndex.3 = INTEGER: 3 UCD-SNMP-MIB::prIndex.4 = INTEGER: 4 UCD-SNMP-MIB::prIndex.5 = INTEGER: 5 UCD-SNMP-MIB::prIndex.6 = INTEGER: 6 UCD-SNMP-MIB::prNames.1 = STRING: syslogd UCD-SNMP-MIB::prNames.2 = STRING: httpd UCD-SNMP-MIB::prNames.3 = STRING: ntpd UCD-SNMP-MIB::prNames.4 = STRING: smartd UCD-SNMP-MIB::prNames.5 = STRING: clamd UCD-SNMP-MIB::prNames.6 = STRING: freshclam UCD-SNMP-MIB::prMin.1 = INTEGER: 1 UCD-SNMP-MIB::prMin.2 = INTEGER: 0 UCD-SNMP-MIB::prMin.3 = INTEGER: 1 UCD-SNMP-MIB::prMin.4 = INTEGER: 0 UCD-SNMP-MIB::prMin.5 = INTEGER: 0 UCD-SNMP-MIB::prMin.6 = INTEGER: 0 UCD-SNMP-MIB::prMax.1 = INTEGER: 1 UCD-SNMP-MIB::prMax.2 = INTEGER: 0 UCD-SNMP-MIB::prMax.3 = INTEGER: 1 UCD-SNMP-MIB::prMax.4 = INTEGER: 0 UCD-SNMP-MIB::prMax.5 = INTEGER: 0 UCD-SNMP-MIB::prMax.6 = INTEGER: 0 UCD-SNMP-MIB::prCount.1 = INTEGER: 1 UCD-SNMP-MIB::prCount.2 = INTEGER: 345 UCD-SNMP-MIB::prCount.3 = INTEGER: 1 UCD-SNMP-MIB::prCount.4 = INTEGER: 1 UCD-SNMP-MIB::prCount.5 = INTEGER: 0 UCD-SNMP-MIB::prCount.6 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.1 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.2 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.3 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.4 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.5 = INTEGER: 1 UCD-SNMP-MIB::prErrorFlag.6 = INTEGER: 1 UCD-SNMP-MIB::prErrMessage.1 = STRING: UCD-SNMP-MIB::prErrMessage.2 = STRING: UCD-SNMP-MIB::prErrMessage.3 = STRING: UCD-SNMP-MIB::prErrMessage.4 = STRING: UCD-SNMP-MIB::prErrMessage.5 = STRING: No clamd process running. UCD-SNMP-MIB::prErrMessage.6 = STRING: No freshclam process running. UCD-SNMP-MIB::prErrFix.1 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.2 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.3 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.4 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.5 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.6 = INTEGER: 0 UCD-SNMP-MIB::prErrFixCmd.1 = STRING: UCD-SNMP-MIB::prErrFixCmd.2 = STRING: UCD-SNMP-MIB::prErrFixCmd.3 = STRING: UCD-SNMP-MIB::prErrFixCmd.4 = STRING: UCD-SNMP-MIB::prErrFixCmd.5 = STRING: UCD-SNMP-MIB::prErrFixCmd.6 = STRING: bourbon# ps -U vscan | grep clam 84154 ?? Is 0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid 84265 ?? Is 0:04.61 /usr/local/sbin/clamd bourbon# ps -acx | grep httpd | wc 744 3720 23808 (744 > 345) ;-) >Fix: Try this patch, though only the second half of it seems to actually fix it: *** acconfig.h.orig Fri May 26 12:36:06 2006 --- acconfig.h Sun Mar 18 22:24:27 2007 *************** *** 488,494 **** #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (200*80) /* roughly 200 lines max */ /* misc defaults */ --- 488,494 ---- #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (1500*80) /* roughly 1500 lines max */ /* misc defaults */ *** include/net-snmp/net-snmp-config.h.in.orig Fri May 26 12:36:06 2006 --- include/net-snmp/net-snmp-config.h.in Sun Mar 18 22:54:13 2007 *************** *** 1334,1340 **** #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (200*80) /* roughly 200 lines max */ /* misc defaults */ --- 1334,1340 ---- #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (1500*80) /* roughly 1500 lines max */ /* misc defaults */ >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070319030510.5020B730003>