Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Jun 2016 14:06:25 +1000
From:      Paul Koch <paul.koch137@gmail.com>
To:        freebsd-hackers@freebsd.org
Subject:   ZFS ARC and mmap/page cache coherency question
Message-ID:  <20160630140625.3b4aece3@splash.akips.com>

next in thread | raw e-mail | index | archive | help

Posted this to -stable on the 15th June, but no feedback...

We are trying to understand a performance issue when syncing large mmap'ed
files on ZFS.

Example test box setup:
 FreeBSD 10.3-p5
 Intel i7-5820K 3.30GHz with 64G RAM
 6 * 2 Tbyte Seagate ST2000DM001-1ER164 in a ZFS stripe

Read performance of a sequentially written large file on the pool is
typically around 950Mbytes/sec using dd.

Our software mmap's some large database files using MAP_NOSYNC, and we call
fsync() every 10 minutes when we know the file system is mostly idle.  In
our test setup, the database files are 1.1G, 2G, 1.4G, 12G, 4.7G and ~20
small files (under 10M).  All of the memory pages in the mmap'ed files are
updated every minute with new values, so the entire mmap'ed file needs to be
synced to disk, not just fragments.

When the 10 minute fsync() occurs, gstat typically shows very little disk
reads and very high write speeds, which is what we expect.  But, every 80
minutes we process the data in the large mmap'ed files and store it in highly
compressed blocks of a ~300G file using pread/pwrite (i.e. not mmap'ed).
After that, the performance of the next fsync() of the mmap'ed files falls
off a cliff.  We are assuming it is because the ARC has thrown away the
cached data of the mmap'ed files.  gstat shows lots of read/write contention
and lots of things tend to stall waiting for disk.

Is this just a lack of ZFS ARC and page cache coherency ??

Is there a way to prime the ARC with the mmap'ed files again before we call
fsync() ?

We've tried cat and read() on the mmap'ed files but doesn't seem to touch the
disk at all and the fsync() performance is still poor, so it looks like the
ARC is not being filled.  msync() doesn't seem to be much different.
mincore() stats show the mmap'ed data is entirely incore and referenced.

	Paul.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160630140625.3b4aece3>