From owner-freebsd-emulation@FreeBSD.ORG Wed May 4 04:48:48 2011 Return-Path: Delivered-To: freebsd-emulation@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEBD7106564A for ; Wed, 4 May 2011 04:48:48 +0000 (UTC) (envelope-from tedm@mittelstaedt.us) Received: from mail.freebsd-corp-net-guide.com (mail.freebsd-corp-net-guide.com [65.75.192.90]) by mx1.freebsd.org (Postfix) with ESMTP id 580ED8FC17 for ; Wed, 4 May 2011 04:48:47 +0000 (UTC) Received: from [192.168.1.64] (nat-rtr.freebsd-corp-net-guide.com [65.75.197.130]) by mail.freebsd-corp-net-guide.com (8.14.4/8.14.4) with ESMTP id p444mfYQ026859 for ; Tue, 3 May 2011 21:48:41 -0700 (PDT) (envelope-from tedm@mittelstaedt.us) Message-ID: <4DC0DAA8.8080607@mittelstaedt.us> Date: Tue, 03 May 2011 21:48:40 -0700 From: Ted Mittelstaedt User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-emulation@freebsd.org References: <32556218.1304447122068.JavaMail.root@mswamui-blood.atl.sa.earthlink.net> In-Reply-To: <32556218.1304447122068.JavaMail.root@mswamui-blood.atl.sa.earthlink.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.0 required=4.5 tests=ALL_TRUSTED autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.freebsd-corp-net-guide.com Subject: Re: virtualbox I/O 3 times slower than KVM? X-BeenThere: freebsd-emulation@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Development of Emulators of other operating systems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 04:48:48 -0000 On 5/3/2011 11:25 AM, John wrote: > > -----Original Message----- >> From: Ted Mittelstaedt Sent: May 3, 2011 >> 12:02 AM To: Adam Vande More Cc: >> freebsd-emulation@freebsd.org Subject: Re: virtualbox I/O 3 times >> slower than KVM? >> >> On 5/2/2011 7:39 PM, Adam Vande More wrote: >>> On Mon, May 2, 2011 at 4:30 PM, Ted >>> Mittelstaedt> >>> wrote: >>> >>> that's sync within the VM. Where is the bottleneck taking place? >>> If the bottleneck is hypervisor to host, then the guest to vm >>> write may write all it's data to a memory buffer in the >>> hypervisor that is then slower-writing it to the filesystem. In >>> that case killing the guest without killing the VM manager will >>> allow the buffer to complete emptying since the hypervisor isn't >>> actually being shut down. >>> >>> >>> No the bottle neck is the emulated hardware inside the VM >>> process container. This is easy to observe, just start a bound >>> process in the VM and watch top host side. Also the hypervisor >>> uses native host IO driver, there's no reason for it to be slow. >>> Since it's the emulated NIC which is the bottleneck, there is >>> nothing left to issue the write. Further empirical evidence for >>> this can be seen by by watching gstat on VM running with an md or >>> ZVOL backed storage. I already utilize ZVOL's for this so it was >>> pretty easy to confirm no IO occurs when the VM is paused or >>> shutdown. >>> >>> Is his app going to ever face the extremely bad scenario, >>> though? >>> >>> >>> The point is it should be relatively easy to induce patterns you >>> expect to see in production. If you can't, I would consider that >>> a problem. Testing out theories(performance based or otherwise) >>> on a production system is not a good way to keep the continued >>> faith of your clients when the production system is a mission >>> critical one. Maybe throwing more hardware at a problem is the >>> first line of defense for some companies, unfortunately I don't >>> work for them. Are they hiring? ;) I understand the logic of >>> such an approach and have even argued for it occasionally. >>> Unfortunately payroll is already in the budget, extra hardware is >>> not even if it would be a net savings. >>> >> >> Most if not all sites I've ever been in that run Windows servers >> behave in this manner. With most of these sites SOP is to "prove" >> that the existing hardware is inadequate by loading whatever >> Windows software that management wants loaded then letting the >> users on the network scream about it. Then money magically frees >> itself up when there wasn't any before. Since of course management >> will never blame the OS for the slowness, always the hardware. >> >> Understand I'm not advocating this, just making an observation. >> >> Understand that I'm not against testing but I've seen people get so >> engrossed in spending time constructing test suites that they have >> ended up wasting a lot of money. I would have to ask, how much >> time did the OP who started this thread take building 2 systems, a >> Linux and a BSD system? How much time has he spent trying to get >> the BSD system to "work as well as the Linux" system? Wouldn't it >> have been cheaper for him to not spend that time and just put the >> Linux system into production? >> >> Ted > > Thanks a lot for everyone's insights and suggestions. The CentOS on > the KVM is a production server, so I took some time to prepare > another CentOS on that KVM and did the test as Ted suggested before > (for comparison, right now the test freebsd is the only guest on the > virtualbox). > > What I do is to cat the 330MB binary file (XP service pack from > Microsoft) 20 times into a single 6.6GB file, "date" before and > afterwards, and after the second date finishes, immediately Force > power shut down. There are two observations: > > 1. the time to complete copying into this 6.6GB file were 72s, 44s, > 79s in three runs, presumably because there is another production VM > on the same host. The average is 65s, so it's about 100MB/s. 2. > After immediately power down, I do found the resulting file was less > than 6.6GB. So indeed the VM claimed the completion of the copying > before it actually did. > For clarity, what your saying is the CentOS guest OS claimed the copy had completed before it actually did, correct? This is consistent with async-mounted filesystems which I believe is the default under CentOS. Your guest is mounting it's own filesystem inside the VM async mount. So when the copy completes and you get back to the shell prompt on the guest, a memory buffer in the guest OS is still copying the last bits of the file to the disk. > I then did the same thing on the virtualbox, since I don't want the > above premature I/O, I made sure the "Use Host I/O cache" is > unchecked for the VM storage. > That setting isn't going to change how the guest async-mounts it's filesystems. All it does is force the hypervisor to not use some caching that the hypervisor is provided with by the host OS. > 1. the time to complete copying into this 6.6GB file was 119s and > 92s, the average is 105s, so the speed is 62MB/s. 2. after > immediately "Reset" the machine, I couldn't boot. Both times it > asked me to do fsck for that partition (GPT 2.2T). But after finally > powering up, I found the file was also less than 6.6GB both times as > well. > I would imagine this would happen. > So looks like virtualbox also suffers caching problem? Or did I do > anything wrong? > There isn't a "caching problem" As we have said on this forum the speed that the actual write is happening is the same under the FreeBSD guest and the CentOS guest. The only difference is the FreeBSD guest is sync-mounting it's filesystem within the virtual machine and the CentOS guest is async-mounting it's filesystem within the virtual machine. Async mount is always faster for writes because what is actually going on is that the write goes to a memory buffer then the OS completes the write "behind the scenes" In many cases when the data in a file is rapidly changing, the write may never go to disk at all, if the OS sees successive writes to the same part of the file it will simply make the writes to the memory buffer then get around to updating the disk when it feels like. > I didn't spend extra time optimizing either the linux or the freebsd, > they are both the production systems from centos and freebsd. I just > want to have a production quality system without too much customized > work. > > Also, most servers will be mail servers and web servers, with some > utilization of database. Granted, copying 6.6GB file is atypical on > these servers, but I just want to get an idea of what the server is > capable of. I do not know a test software that can benchmark my > usage pattern and is readily available on both centos and freebsd. > What it really sounds like to me is that your just not understanding the difference in how the filesystem is mounted. For starters you have your host OS which the hypervisor is running on. You have a large file on that host which comprises the VM, either freeBSD or CentOS. When the FreeBSD or CentOS guest is making it's writes it is making them into that large file. if the host has that file sync-mounted then it will slow file access by the hypervisor to that file. And then you have the guest OSes which themselves have their own memory buffers and mount chunks of that file as their filesystems. They can mount these chunks sync or async. If they mount them async then it makes access to those chunks faster also. There is a tradeoff here. If you sync-mount a filesystem then if the operating system halts or crashes then there is usually little to no file system damage. But, access to the disk will be slowest. If you async mount a filesystem then if the operating system crashes then you will have a lot of garbage and file corruption. But, access will be the fastest. A very common configuration for a mailserver is when your partitioning the filesystem to create the usual /, swap, /usr, /tmp, & /var - then create an additional /home and "mail". Then you either mount "mail" on /var/mail or you mount it on /mail and softlink /var/mail to /mail. Then you setup /tmp, /home, and /mail or /var/mail as async mount and everything else sync mount, and softlink /var/spool to /tmp. That way if the mailserver reboots or crashes then the program files are generally not affected even if the e-mail is scotched, yet you get the fastest possible disk performance. If a partition is so far gone that it cannot even be repaired by fsck then you can just newfs it and start over. It is also a lot easier to create a dump/restore back up scheme, too. With CentOS/Linux it's a bit different because that OS mounts the entire disk on / and creates subdirectories for everything. That is one of the (many) reasons I don't ever use Linux for mailservers, you do not have the same kind of fine-grained control. But you can create multiple partitions on CentOS, too. Also, the fact is that the FreeBSD filesystem and OS has been heavily optimized and if the mailserver isn't that busy you don't need to bother async mounting any of it's partitions, because the system will simply spawn more processes. You got to think of it this way, for example with a mailserver let's say sync mounting causes each piece of e-mail to spend 15 ms in disk access and let's say async mounting cuts that to 5ms - well if the mailserver normally runs at about 10 simultaneous sendmail instances under async mounting then it will run 30 instances under sync mounting at the same throughput - and with each instance only taking 100MB of ram, you can toss a couple extra GB of ram in the server and forget about it. Ted _______________________________________________ > freebsd-emulation@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-emulation To > unsubscribe, send any mail to > "freebsd-emulation-unsubscribe@freebsd.org"