Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Sep 2014 11:07:20 +1000
From:      Paul Koch <paul.koch@akips.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: 10.0 interaction with vmware
Message-ID:  <20140903110720.2bd1b373@akips.com>
In-Reply-To: <ltpuji$lus$1@ger.gmane.org>
References:  <20140826171657.0c79c54d@akips.com> <ltpuji$lus$1@ger.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 29 Aug 2014 15:18:32 +0200
Ivan Voras <ivoras@freebsd.org> wrote:

> On 26/08/2014 09:16, Paul Koch wrote:
> 
> > How does this work actually work ?  Does it only take back what
> > FreeBSD considers to be "free" memory or can the host start taking
> > back "inactive", "wired", "zfs arc" memory ?  We tend to rely on
> > stuff being in inactive and zfs arc.  If we start swapping, we
> > are dead.
> 
> Under memory pressure, VMWare's Balooning will cause internal FreeBSD's
> "memory low" triggers to fire, which will release ARC memory, which will
> probably degrade your performance. But from what I've seen, for some
> reason, it's pretty hard to actually see the VMWare host activate
> balooning, at least on FreeBSD servers. I've been using this combination
> for years and I only saw it once, for a trivial amount of memory. It's
> probably a last-resort measure.

Yer, releasing ARC memory would be tragic because it would already
contain useful data for us and going back to disk/SAN would be a
hit.  We do set limits on the ARC size on the install because it 
appears to be very "aggressive" at consuming memory.

We also constantly monitor/graph memory usage, so the customer can get
some idea of what is happening on their FreeBSD VM.

eg. http://www.akips.com/gz/downloads/sys-graph.html
    http://www.akips.com/gz/downloads/poller-graph.html

On that machine, ARC has been limited to ~2G, and it appears to always
hover around there.  If ballooning was turned on and memory was tight
enough to cause ARC to drop, at least they'd be able to go back in 
time and see that something tragic happened.

 
> Also, VMWare will manage guest memory even without any guest software at
> all. It keeps track of recently active memory pages and may swap the
> unused ones out.

In computing time, how long is "recently" ???

We have very few running processes, and a handful of largish mmap'ed
files.  Most of the mmap'ed files are read ~40 times a second, so we'd
assume that they are always "recently" active.

Our largest mmap'ed file is only written to once a minute with every
polled statistic. Every memory page updated, but once a minute may not
be considered "recently" in computing time.  If ballooning caused
paging out of that mmap'ed file, we'd be toast.


> FWIW, I think ZFS's crazy memory footprint makes it unsuitable for VMs
> (or actually most non-file-server workflows...), but I'm sure most
> people here will not agree with me :D If you have the opportunity to try
> it out in production, just run a regular UFS2+SU in your VM for a couple
> of days and see the difference.

We actually started out with UFS2+SU on our data partition, but wanted
a FreeBSD install configuration of "one size fits all" that would work
ok on bare metal and a VM.  We have zero control on of the platform the
customer uses - ranging from a throw away old desktop PC to high end
dedicated bare metal, or in a VM in the data centre.  Since we are
mostly CPU bound, ZFS doesn't appear to be a performance problem for
us in a VM.


On a side note, one of the reasons why switched to ZFS is because we
"thought" we had a data corruption problem with UFS2 when shutting
down.  It took a while to discover what we were doing wrong.  Doh!!

At shutdown, running on physical hardware or in a VM, we'd get to
"All Buffers Synced" and the machine would hang for ages before powering
off or rebooting.  When it came back up, the file system was dirty, and
wasn't umounted properly.  Googling for 'all buffers synced' came up
with various issues related to USB.

But, what was happening was... we have largish mmap'ed files (eg. 2G),
which we mmap with the MAP_NOSYNC flag.  The memory pages are being
written to constantly, and we fsync() them every 600 seconds so we can
control the time when the disk write occurs.  It appears the fsync 
writes out the entire mmap'ed file sequentially because a quick calc on
the file size and raw disk write speed generally matches.  But at 
shutdown, we were forgetting to do a final fsync on those big files,
which meant that the OS had to write them out.  That doesn't appear
to occur until after the "all buffers synced" message though.  On 
real hardware, it just looks like the machine has hung, but did notice
the disk led hard on.  Running in a VirtualBox VM, at shutdown we ran
gstat/systat on the FreeBSD host, which showed the disk stuck in 100%
for ages and ages after the "all buffers synced" message.  It was taking
so long that the VM was being killed ungracefully by the shutdown scripts.

We use MAP_NOSYNC because without it, the default sync'ing behaviour on
large mmap'ed files sucks.  It seems the shutdown behaviour is similar
or much worse.

The problem on physical hardware was no obvious messages of what the
machine was doing after the "all buffers synced" message!

Now we just do a fsync(1) of every mmap'ed file in our shutdown script,
and the machine shuts down clean and fast.

	Paul.
-- 
Paul Koch | Founder, CEO
AKIPS Network Monitor
http://www.akips.com
Brisbane, Australia



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140903110720.2bd1b373>