Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 May 2016 09:10:30 +0200 (CEST)
From:      =?ISO-8859-1?Q?Trond_Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
To:        Wolfgang Zenker <wolfgang@lyxys.ka.sub.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: Recent stable: bsnmpd eats up memory and cpu
Message-ID:  <alpine.BSF.2.20.1605020846470.1206@mail.fig.ol.no>
In-Reply-To: <20160501220107.GA58930@lyxys.ka.sub.org>
References:  <20160501220107.GA58930@lyxys.ka.sub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 2 May 2016 00:01+0200, Wolfgang Zenker wrote:

> Hi,
> 
> after updating some 10-STABLE systems a few days ago, I noticed that on
> two of those systems bsnmpd started to use up a lot of cpu time, and the
> available memory shrinked until rendering the system unusable. Killing
> bsnmpd stops the cpu usage but does not free up memory.
> Both affected systems are amd64, one having moved from r297555 to
> r298723, the other from r297555 to r298722. Another amd64 system
> that went from r297555 to r298722 appears to be not affected.
> The two affected systems are on an internal LAN segment and there
> is currently no application connecting to snmp on those machines.
> 
> What would be useful debugging data to collect in this case?

I believe I've seen the very same on my systems. All of them got 
updated last Friday due to the recent NTP fix. Prior to last Friday, 
they all ran stable/10 from early March, r296648-ish. Neither of them 
run bsnmpd, but they offer a lot of network services.

Three of my i386 systems each with 1 GiB of memory ran out of swap 
space, Sunday afternoon.

This night a mail server running i386 with 4 GiB of memory died while 
handling mail. From the messages I could glean on /dev/ttyvb (due to 
custom logging) before rebooting, is that it's all networking related.

SpamAssassin and syslogd on the mail server managed to transmit these 
lines to the central log host before dying:

May  2 00:05:17 <mail.err> [HOSTNAME] spamc[63613]: connect to spamd on ::1 failed, retrying (#1 of 3): Connection refused
May  2 00:05:17 <mail.err> [HOSTNAME] spamc[63613]: connect to spamd on 127.0.0.1 failed, retrying (#1 of 3): Connection refused
May  2 00:05:18 <mail.err> [HOSTNAME] spamc[63613]: connect to spamd on ::1 failed, retrying (#2 of 3): Connection refused
May  2 00:05:18 <mail.err> [HOSTNAME] spamc[63613]: connect to spamd on 127.0.0.1 failed, retrying (#2 of 3): Connection refused
May  2 00:05:19 <mail.err> [HOSTNAME] spamc[63613]: connect to spamd on ::1 failed, retrying (#3 of 3): Connection refused
May  2 00:05:19 <mail.err> [HOSTNAME] spamc[63613]: connect to spamd on 127.0.0.1 failed, retrying (#3 of 3): Connection refused
May  2 00:05:19 <mail.err> [HOSTNAME] spamc[63613]: connection attempt to spamd aborted after 3 retries

May  2 00:52:17 <mail.err> [HOSTNAME] sm-mta[63740]: u41Mp86h063740: Milter (spamassassin): error creating socket: No buffer space available
May  2 00:52:17 <mail.err> [HOSTNAME] sm-mta[63739]: u41Mp8r9063739: Milter (spamassassin): error creating socket: No buffer space available
May  2 00:52:17 <mail.info> [HOSTNAME] sm-mta[63740]: u41Mp86h063740: Milter (spamassassin): to error state
May  2 00:52:17 <mail.info> [HOSTNAME] sm-mta[63739]: u41Mp8r9063739: Milter (spamassassin): to error state

All of the amd64 systems with 4 GiB or 8 GiB of memory are apparently 
unaffected.

Maybe it's time to convert the remaining i386 systems to amd64 
systems, and add some memory while I'm at it.

The bug is either in the kernel or in libc, or both.

-- 
+-------------------------------+------------------------------------+
| Vennlig hilsen,               | Best regards,                      |
| Trond Endrestøl,              | Trond Endrestøl,                   |
| IT-ansvarlig,                 | System administrator,              |
| Fagskolen Innlandet,          | Gjøvik Technical College, Norway,  |
| tlf. mob.   952 62 567,       | Cellular...: +47 952 62 567,       |
| sentralbord 61 14 54 00.      | Switchboard: +47 61 14 54 00.      |
+-------------------------------+------------------------------------+
From owner-freebsd-stable@freebsd.org  Mon May  2 11:42:56 2016
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50C4AB29854
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Mon,  2 May 2016 11:42:56 +0000 (UTC)
 (envelope-from areilly@bigpond.net.au)
Received: from nskntqsrv01p.mx.bigpond.com (nskntqsrv01p.mx.bigpond.com
 [61.9.168.231])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "InterMail Test Certificate",
 Issuer "Certificate Authority" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id DA0ED1472
 for <freebsd-stable@freebsd.org>; Mon,  2 May 2016 11:42:55 +0000 (UTC)
 (envelope-from areilly@bigpond.net.au)
Received: from nskntcmgw08p ([61.9.169.168]) by nskntmtas05p.mx.bigpond.com
 with ESMTP
 id <20160502113010.VWSW2067.nskntmtas05p.mx.bigpond.com@nskntcmgw08p>
 for <freebsd-stable@freebsd.org>; Mon, 2 May 2016 11:30:10 +0000
Received: from ghanamia.reilly.home ([121.211.74.3])
 by nskntcmgw08p with BigPond Outbound
 id pPWA1s00K04FjAp01PWARc; Mon, 02 May 2016 11:30:10 +0000
X-Authentication-Info: Submitted using ID areilly@bigpond.net.au
X-Authority-Analysis: v=2.0 cv=buzO9Tmi c=1 sm=1
 a=3jNtSoK4IhUy2m3FAQj8ZQ==:17 a=FbbKyvPSoxHXp79FHpoA:9 a=CjuIK1q_8ugA:10
 a=3jNtSoK4IhUy2m3FAQj8ZQ==:117
From: Andrew Reilly <areilly@bigpond.net.au>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: Did anything change WRT Jail network access in the last week or so?
 (10.3-STABLE #17 r298791)
Message-Id: <A969288F-1497-4722-86C9-2F79E00009E3@bigpond.net.au>
Date: Mon, 2 May 2016 21:30:10 +1000
To: freebsd-stable@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>;
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2016 11:42:56 -0000

Hi all,

Some time ago I resorted to setting up a Jail to support my
SqueezeBox system: the version in ports (audio/squeezeboxserver)
is not current, and needs an old version of mysql and an old
version of perl.  A Jail seemed like the right answer.  For a
while it worked OK (for small values of OK), but in the last
week, perhaps even with my most recent weekly upgrade to STABLE
(revision as above) the wheels have fallen off in the form that
the player devices no longer seem to be able to do whatever
network boot thing they do, against the server.  One of them
has been power-cycled and seems dead to the world, the other
is still running from its last boot, but claims not to be able
to "see" the server.  The server can't see either of them.  I
assume that some sort of proprietary broadcast protocol is
involved in this discovery process, although the devices acquire
IP addresses from my 10.3 server's DHCPD.

My jail configuration (in /etc/jail.conf) is:

SB {
        host.hostname = "SB.reilly.home";
        path = "/usr/home/SB";
        ip4.addr += "10.0.0.26/24";
        allow.raw_sockets = 1;
        exec.clean;
        exec.system_user = "root";
        exec.jail_user = "root";
        exec.start += "/bin/sh /etc/rc";
        exec.stop = "/bin/sh /etc/rc.shutdown";
        exec.consolelog = "/var/log/jail_SB_console.log";
        mount.devfs;
        allow.set_hostname = 0;
        allow.sysvipc = 0;
}

I believe that the "allow.raw_sockets = 1;" line is the part
that had previosuly allowed the auto-discovery protocol to work.

I'm not sure if it's redundant or not, but I also have the
following line in my /etc/rc.conf:

ifconfig_re0_alias0="inet 10.0.0.26 netmask 0xffffff00"

FWIW the host that this jail is running on is at 10.0.0.2/24.

As I said above, this was all working up to a week or so ago,
and all I've done in the mean time is a base upgrade and a
portmaster upgrade of installed ports (not the jail ports: they
haven't changed since installed.)

Cheers,

-- 
Andrew




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.20.1605020846470.1206>