Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Oct 2012 19:33:12 -0700
From:      Marcel Moolenaar <marcel@xcllnt.net>
To:        Alan Cox <alc@rice.edu>
Cc:        Poul-Henning Kamp <phk@phk.freebsd.dk>, "freebsd-arch@freebsd.org Arch" <freebsd-arch@freebsd.org>, Tim LaBerge <tlaberge@juniper.net>, Jason Evans <jasone@freebsd.org>
Subject:   Re: Behavior of madvise(MADV_FREE)
Message-ID:  <8D2E1B5A-6DD1-49E3-8F55-B3B816449FFB@xcllnt.net>
In-Reply-To: <5082F0F3.1070102@rice.edu>
References:  <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> <E6A52D27-0D6A-4175-9ECA-ADE25BFF35C2@xcllnt.net> <F71ACE9D-297E-4565-BB8D-D95D46D90708@freebsd.org> <F67D539D-8BE3-4817-8466-C76DE43AE252@xcllnt.net> <5082F0F3.1070102@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On Oct 20, 2012, at 11:44 AM, Alan Cox <alc@rice.edu> wrote:
>=20
>> Also, moving the complexity of exactly which hint to give the
>> kernel under different scenarios isn't that appealing at all.
>> It just doesn't scale.
>=20
>=20
> I think that you're being a bit too pessimistic here.  If your use =
case really corresponds to "this memory is free and will not be reused =
(or reallocated for a very long time)", then that is qualitatively very =
different from the way malloc(3) uses MADV_FREE.  malloc(3)'s use of =
MADV_FREE is highly speculative.  It doesn't really know what the =
application is going to do in the future.  I don't think that having two =
distinct hints that distinguish between "speculative" and =
"non-speculative" uses would be problematic.  The distinction is real =
and also easy to explain.  The only danger is that application writers =
really don't understand their application and use the wrong hint.

Maybe. I need to think about this. On the surface it's hard to
belief that any allocator can reliably predict the future, so
all hints are speculative in that sense. I do buy into the fact
that malloc(3) has no a priori knowledge of the behavior of an
application and an application with a special-purpose allocator
has an allocator with more knowledge of the behavior. I'm just
not sure this warrants different hints.

I agree that the more complicated the hints the more likely they
are not being used at all or they are used the wrong way.

>=20
>> ... If some VM changes warrant a new hint
>> to madvise(), you may end up changing multiple daemons. It
>> seems better to have just 1 hint (i.e. MADV_FREE) and have the
>> kernel change its behaviour depending on the situation. When
>> there's plenty of memory, you may even ignore the hint. Under
>> severe memory pressure you may want to free up the page right
>> away so that you can give it to some thread that's waiting
>> for a page.
>=20
>=20
> How is this really different from the existing behavior?  If a thread =
is waiting for a page, then the page daemon is running.  In particular, =
it is moving pages from the head of the inactive queue, where they were =
placed by MADV_FREE, to the cache/free queue and waking up the waiting =
thread when the aggregate cache/free target is met.

What we see with FreeBSD 6.1 is that memory remains inactive
indefinitely. If the behaviour has changed in more recent
versions, then we'll reap the benefits soon. If not, then we
(=3D Juniper) may want to look into this.


>>  At the edge of needing to swap, complex algorithms
>> may be worthwhile -- or maybe not. I don't know.
>>=20
>> This leads to:
>> 1.  Keep MADV_FREE as it behaves in FreeBSD right now or make
>>     it even more sloppy.
>=20
>=20
> I'm not sure that I understand what you mean by "sloppy" here.  Can =
you elaborate?

It's just a sloppy way of saying that the hint can be ignored
altogether or that we simply mark the page as clean and not
do anything else. The point was mostly that the performance
argument is more important.


> 2.  Have an idle thread that moves inactive pages to the cache
>>     or free queue if they've been inactive for X minutes, for
>>     some tunable X. Have it back off when the pageout daemon
>>     kicks in.
>=20
>=20
> The existing page daemon already wakes up periodically and looks =
around for something to do.  In particular, have a look at =
vm_pageout_page_stats().  That function tries to do something analogous =
to what you propose.  In part, it tries to prevent munmap(2)ed =
file-backed pages from getting stuck in the active queue.

I'll take a look. That's good to know.

\begin{disclaimer}
Juniper's problem is being stuck with an obsolete version of
FreeBSD and we're likely to look for solutions to problems that
don't exist anymore in recent versions. Just bear with us for
a while :-)
\end{disclaimer}

--=20
Marcel Moolenaar
marcel@xcllnt.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8D2E1B5A-6DD1-49E3-8F55-B3B816449FFB>