Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Apr 2014 12:30:40 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Ian Lepore <ian@freebsd.org>
Cc:        freebsd-hackers@freebsd.org, Dmitry Sivachenko <trtrmitya@gmail.com>, Trond =?iso-8859-1?q?Endrest=F8l?= <Trond.Endrestol@fagskolen.gjovik.no>
Subject:   Re: madvise() vs posix_fadvise()
Message-ID:  <201404031230.40380.jhb@freebsd.org>
In-Reply-To: <1396539837.81853.278.camel@revolution.hippie.lan>
References:  <D6BD48AF-9522-495D-8D54-37854E53C272@gmail.com> <201404031102.38598.jhb@freebsd.org> <1396539837.81853.278.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, April 03, 2014 11:43:57 am Ian Lepore wrote:
> On Thu, 2014-04-03 at 11:02 -0400, John Baldwin wrote:
> > On Thursday, April 03, 2014 7:29:03 am Dmitry Sivachenko wrote:
> > >=20
> > > On 27 =CD=C1=D2=D4=C1 2014 =C7., at 19:41, John Baldwin <jhb@FreeBSD.=
org> wrote:
> > > >>=20
> > > >> I know about mlock(2), it is a bit overkill.
> > > >> Can someone please explain the difference between madvise(MADV_WIL=
LNEED) and=20
> > > > posix_fadvise(POSIX_FADV_WILLNEED)?
> > > >=20
> > > > Right now FADV_WILLNEED is a nop.  (I have some patches to implemen=
t it for
> > > > UFS.)  I can't recall off the top of my head if MADV_WILLNEED is al=
so a nop.
> > > > However, if both are fully implemented they should be similar in te=
rms of
> > > > requesting async read-ahead.  MADV_WILLNEED might also conceivably
> > > > pre-create PTEs while FADV_WILLNEED can be used on a file that isn't
> > > > mapped but is accessed via read(2).
> > > >=20
> > >=20
> > >=20
> > > Hello and thanks for your reply.
> > >=20
> > > Right now I am facing the following problem (stable/10):
> > > There is a (home-grown) webserver which mmap's a large amount of data=
 files (total size is a bit below of RAM, say ~90GB of files with 128GB of=
=20
RAM).
> > > Server writes access.log (several gigabytes per day).
> > >=20
> > > Some of mmaped data files are used frequently, some are used rarely. =
On startup, server walks through all of these data files so it's content=20
is read=20
> > from disk.
> > >=20
> > > After some time of running, I see that rarely used data files are pur=
ged from RAM (access to them leads to long-running disk reads) in favour=20
of disk=20
> > cache
> > > (at 0:00, when I rotate and gzip log file I see Inactive memory goes =
down to the value of log file size).
> > >=20
> > > Is there any way to tell VM system not to push mmap'ed regions out of=
 RAM in favour of disk caches?
> >=20
> > Use POSIX_FADV_NOREUSE with fadvise() for the log files.  They are a pe=
rfect
> > use case for this flag.  This will tell the VM system to throw the log =
data
> > (move it to cache) after it writes the file.
> >=20
> > --=20
> > John Baldwin
>=20
> Does that work well in the case of something like /var/log/messages that
> is repeatedly appended-to at random intervals?  It would be bad if every
> new line written to the log triggered a physical read-modify-write.  On
> the other hand if it somehow results in the last / partitial block being
> the only one likely to stay in memory, that would be perfect.

The latter.  It's sort of like a lazy O_DIRECT.  Each time you call write(2=
),
it tries to move any clean pages from your current sequentially written
stream from inactive to cache, so the pages won't move until a subsequent
write(2) after bufdaemon or the syncer actually forces them to be written.
Unfortunately, it is currently implemented by doing an internal
=46ADV_DONTNEED after each read() or write().  It would be better if it was
implemented as a callback when buffers are completed.

=2D-=20
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201404031230.40380.jhb>