Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Mar 2015 19:39:08 -0400
From:      The Lost Admin <thelostadmin@gmail.com>
To:        "freebsd-questions@freebsd.org" <freebsd-questions@freebsd.org>
Subject:   Re: Significant memory leak in 9.3p10?
Message-ID:  <79371E33-B999-4CAC-A8E4-8D5DDBF043E6@gmail.com>
In-Reply-To: <CABXB=RTe9d0DD68RCi6JWKH%2BcK%2Ba8McmKmeejTypLhVZRc0t7w@mail.gmail.com>
References:  <CABXB=RRhynY5FWvw3tHrLFRyitTemavXYLBpev5Mjs_kPqimXA@mail.gmail.com> <20150316232404.GM2379@kib.kiev.ua> <CABXB=RSt0MgEyoJs4o5utTg7oSu0RZ%2B-czeY0k-Ro%2BfRubK3kQ@mail.gmail.com> <CABXB=RTe9d0DD68RCi6JWKH%2BcK%2Ba8McmKmeejTypLhVZRc0t7w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

The Lost Admin
thelostadmin@gmail.com



On Mar 26, 2015, at 3:46 PM, J David <j.david.lists@gmail.com> wrote:

> On Mon, Mar 16, 2015 at 7:52 PM, J David <j.david.lists@gmail.com> =
wrote:
>> On Mon, Mar 16, 2015 at 7:24 PM, Konstantin Belousov
>> <kostikbel@gmail.com> wrote:
>>> There are a lot of possibilities to create persistent anonymous =
shared
>>> memory objects.  Not complete list is tmpfs mounts, swap-backed md =
disks,
>>> sysv shared memory, possibly posix shared memory (I do not remember =
which
>>> implementation is used in stable/9).
>>=20
>> If that's the explanation, how could it be
>> detected/measured/investigated/resolved/prevented?
>>=20
>> Under ordinary circumstances, machines will go run like this for =
days/weeks:
>>=20
>> Mem: 549M Active, 3623M Inact, 567M Wired, 3484K Cache, 827M Buf, =
3156M Free
>> Swap: 1024M Total, 1024M Free
>>=20
>> Then, when this happens, it rapidly degrades from that to so bad that
>> processes start getting killed for being out of swap space.
>=20
> These FreeBSD machines running out of swap space and dying continues
> to be a daily problem causing outages and unscheduled reboots.  Is
> there really no way to even research what might be causing the
> problem?
>=20
> (Widening the cross-posting in the hopes of eliciting more help, so
> the brief summary of the problem orginally posted to freebsd-stable is
> that an unknown actor consumes all the user-space memory in the
> system, including swap space, to the point where processes are killed
> for being out of swap space, but if every process on the machine is
> stopped, very little of the user-space memory in use is freed.
> Original message with more details is here:
> =
https://lists.freebsd.org/pipermail/freebsd-stable/2015-March/081986.html
> .)
>=20
> There are no tmpfs mounts or md disks, so it would have to be one of
> the other causes.  How can FreeBSD's use of persistent, anonymous
> shared memory objects be investigated, measured, or controlled so we
> can get a handle on this issue?

In your initial thread, you said:
$ sudo halt -p
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
> Waiting (max 60 seconds) for system process `syncer' to stop=85
> Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done
> All buffers synced.  <----- 10 MINUTE HANG AFTER PRINTING THIS
> Uptime: 3d15h56m32s
> usbus0: Controller shutdown
> uhub0: at usbus0, port 1, addr 1 (disconnected)
> usbus0: controller did not stop
> usbus0: Controller shutdown complete
> acpi0: Powering system off
> Connection closed by foreign host.

> So it seems like somewhere after "All buffers synced" and printing the
> uptime, it's very slowly unwinding whatever is using up all that RAM
> and swap.
Have you looked through the system shutdown scripts (part of init/rc) to =
see what happens after the uptime is printed? that might give you a =
lead.

The output from your PS seams to be much shorter than I would expect. =
Are you sure it included everything? For example, I would expect to see =
processes for cron, syslog, and normally sshd. I=92ve also got a few =
more kernel processes that you don=92t appear to have. Most notably is =
pagedaemon

For what it=92s worth, I=92m running 9.3 RELEASE-P12 (the -p10 kernel) =
on a system 24x7 (6 days since the last reboot) and I haven=92t had an =
issue. It=92s a low volume NFS server.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?79371E33-B999-4CAC-A8E4-8D5DDBF043E6>