Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 May 2011 21:48:40 -0700
From:      Ted Mittelstaedt <tedm@mittelstaedt.us>
To:        freebsd-emulation@freebsd.org
Subject:   Re: virtualbox I/O 3 times slower than KVM?
Message-ID:  <4DC0DAA8.8080607@mittelstaedt.us>
In-Reply-To: <32556218.1304447122068.JavaMail.root@mswamui-blood.atl.sa.earthlink.net>
References:  <32556218.1304447122068.JavaMail.root@mswamui-blood.atl.sa.earthlink.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/3/2011 11:25 AM, John wrote:
>
> -----Original Message-----
>> From: Ted Mittelstaedt<tedm@mittelstaedt.us> Sent: May 3, 2011
>> 12:02 AM To: Adam Vande More<amvandemore@gmail.com> Cc:
>> freebsd-emulation@freebsd.org Subject: Re: virtualbox I/O 3 times
>> slower than KVM?
>>
>> On 5/2/2011 7:39 PM, Adam Vande More wrote:
>>> On Mon, May 2, 2011 at 4:30 PM, Ted
>>> Mittelstaedt<tedm@mittelstaedt.us <mailto:tedm@mittelstaedt.us>>
>>> wrote:
>>>
>>> that's sync within the VM.  Where is the bottleneck taking place?
>>> If the bottleneck is hypervisor to host, then the guest to vm
>>> write may write all it's data to a memory buffer in the
>>> hypervisor that is then slower-writing it to the filesystem.  In
>>> that case killing the guest without killing the VM manager will
>>> allow the buffer to complete emptying since the hypervisor isn't
>>> actually being shut down.
>>>
>>>
>>> No the bottle neck is the emulated hardware inside the VM
>>> process container.  This is easy to observe, just start a bound
>>> process in the VM and watch top host side.  Also the hypervisor
>>> uses native host IO driver, there's no reason for it to be slow.
>>> Since it's the emulated NIC which is the bottleneck, there is
>>> nothing left to issue the write. Further empirical evidence for
>>> this can be seen by by watching gstat on VM running with an md or
>>> ZVOL backed storage.  I already utilize ZVOL's for this so it was
>>> pretty easy to confirm no IO occurs when the VM is paused or
>>> shutdown.
>>>
>>> Is his app going to ever face the extremely bad scenario,
>>> though?
>>>
>>>
>>> The point is it should be relatively easy to induce patterns you
>>> expect to see in production.  If you can't, I would consider that
>>> a problem. Testing out theories(performance based or otherwise)
>>> on a production system is not a good way to keep the continued
>>> faith of your clients when the production system is a mission
>>> critical one.  Maybe throwing more hardware at a problem is the
>>> first line of defense for some companies, unfortunately I don't
>>> work for them.  Are they hiring? ;)  I understand the logic of
>>> such an approach and have even argued for it occasionally.
>>> Unfortunately payroll is already in the budget, extra hardware is
>>> not even if it would be a net savings.
>>>
>>
>> Most if not all sites I've ever been in that run Windows servers
>> behave in this manner.  With most of these sites SOP is to "prove"
>> that the existing hardware is inadequate by loading whatever
>> Windows software that management wants loaded then letting the
>> users on the network scream about it.  Then money magically frees
>> itself up when there wasn't any before.  Since of course management
>> will never blame the OS for the slowness, always the hardware.
>>
>> Understand I'm not advocating this, just making an observation.
>>
>> Understand that I'm not against testing but I've seen people get so
>> engrossed in spending time constructing test suites that they have
>> ended up wasting a lot of money.  I would have to ask, how much
>> time did the OP who started this thread take building 2 systems, a
>> Linux and a BSD system?  How much time has he spent trying to get
>> the BSD system to "work as well as the Linux" system?  Wouldn't it
>> have been cheaper for him to not spend that time and just put the
>> Linux system into production?
>>
>> Ted
>
> Thanks a lot for everyone's insights and suggestions.  The CentOS on
> the KVM is a production server, so I took some time to prepare
> another CentOS on that KVM and did the test as Ted suggested before
> (for comparison, right now the test freebsd is the only guest on the
> virtualbox).
>
> What I do is to cat the 330MB binary file (XP service pack from
> Microsoft) 20 times into a single 6.6GB file, "date" before and
> afterwards, and after the second date finishes, immediately Force
> power shut down.  There are two observations:
>
> 1. the time to complete copying into this 6.6GB file were 72s, 44s,
> 79s in three runs, presumably because there is another production VM
> on the same host.  The average is 65s, so it's about 100MB/s. 2.
> After immediately power down, I do found the resulting file was less
> than 6.6GB.  So indeed the VM claimed the completion of the copying
> before it actually did.
>

For clarity, what your saying is the CentOS guest OS claimed the copy
had completed before it actually did, correct?  This is consistent with
async-mounted filesystems which I believe is the default under CentOS.
Your guest is mounting it's own filesystem inside the VM async mount.
So when the copy completes and you get back to the shell prompt on the
guest, a memory buffer in the guest OS is still copying the last bits of
the file to the disk.

> I then did the same thing on the virtualbox, since I don't want the
> above premature I/O, I made sure the "Use Host I/O cache" is
> unchecked for the VM storage.
>

That setting isn't going to change how the guest async-mounts it's
filesystems.  All it does is force the hypervisor to not use some 
caching that the hypervisor is provided with by the host OS.

> 1. the time to complete copying into this 6.6GB file was 119s and
> 92s, the average is 105s, so the speed is 62MB/s. 2. after
> immediately "Reset" the machine, I couldn't boot.  Both times it
> asked me to do fsck for that partition (GPT 2.2T).  But after finally
> powering up, I found the file was also less than 6.6GB both times as
> well.
>

I would imagine this would happen.

> So looks like virtualbox also suffers caching problem?  Or did I do
> anything wrong?
>

There isn't a "caching problem"  As we have said on this forum the
speed that the actual write is happening is the same under the FreeBSD
guest and the CentOS guest.  The only difference is the FreeBSD guest
is sync-mounting it's filesystem within the virtual machine and the
CentOS guest is async-mounting it's filesystem within the virtual machine.

Async mount is always faster for writes because what is actually going
on is that the write goes to a memory buffer then the OS completes the
write "behind the scenes"  In many cases when the data in a file is
rapidly changing, the write may never go to disk at all, if the OS sees
successive writes to the same part of the file it will simply make the
writes to the memory buffer then get around to updating the disk when
it feels like.

> I didn't spend extra time optimizing either the linux or the freebsd,
> they are both the production systems from centos and freebsd.  I just
> want to have a production quality system without too much customized
> work.
>
> Also, most servers will be mail servers and web servers, with some
> utilization of database.  Granted, copying 6.6GB file is atypical on
> these servers, but I just want to get an idea of what the server is
> capable of.  I do not know a test software that can benchmark my
> usage pattern and is readily available on both centos and freebsd.
>

What it really sounds like to me is that your just not understanding
the difference in how the filesystem is mounted.  For starters you
have your host OS which the hypervisor is running on.  You have a large
file on that host which comprises the VM, either freeBSD or CentOS.
When the FreeBSD or CentOS guest is making it's writes it is making
them into that large file.  if the host has that file sync-mounted
then it will slow file access by the hypervisor to that file.

And then you have the guest OSes which themselves have their own
memory buffers and mount chunks of that file as their filesystems.
They can mount these chunks sync or async.  If they mount them async
then it makes access to those chunks faster also.

There is a tradeoff here.  If you sync-mount a filesystem then if the
operating system halts or crashes then there is usually little to no
file system damage.  But, access to the disk will be slowest.  If you
async mount a filesystem then if the operating system crashes then
you will have a lot of garbage and file corruption.  But, access will
be the fastest.

A very common configuration for a mailserver is when your partitioning
the filesystem to create the usual /, swap, /usr, /tmp, & /var - then
create an additional /home and "mail".  Then you either mount "mail" on
/var/mail or you mount it on /mail and softlink /var/mail to /mail. 
Then you setup /tmp, /home, and /mail or /var/mail as async mount and
everything else sync mount, and softlink /var/spool to /tmp.

That way if the mailserver reboots or crashes then the program files
are generally not affected even if the e-mail is scotched, yet you
get the fastest possible disk performance.  If a partition is so far
gone that it cannot even be repaired by fsck then you can just
newfs it and start over.  It is also a lot easier to
create a dump/restore back up scheme, too.

With CentOS/Linux it's a bit different because that OS mounts the
entire disk on / and creates subdirectories for everything.  That is
one of the (many) reasons I don't ever use Linux for mailservers,
you do not have the same kind of fine-grained control.  But you can
create multiple partitions on CentOS, too.

Also, the fact is that the FreeBSD filesystem and OS has been heavily
optimized and if the mailserver isn't that busy you don't need to
bother async mounting any of it's partitions, because the system
will simply spawn more processes.  You got to think of it this way,
for example with a mailserver let's say sync mounting causes each
piece of e-mail to spend 15 ms in disk access and let's say async
mounting cuts that to 5ms  - well if the mailserver normally runs
at about 10 simultaneous sendmail instances under async mounting then
it will run 30 instances under sync mounting at the same throughput - 
and with each instance only taking 100MB of ram, you can toss a couple 
extra GB of ram in the server and forget about it.

Ted

  _______________________________________________
> freebsd-emulation@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-emulation To
> unsubscribe, send any mail to
> "freebsd-emulation-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DC0DAA8.8080607>