Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Sep 2010 17:09:37 +0300
From:      Andriy Gapon <avg@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-fs@freebsd.org, jhell <jhell@DataIX.net>, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject:   Re: zfs very poor performance compared to ufs due to lack of cache?
Message-ID:  <4C90D3A1.7030008@freebsd.org>
In-Reply-To: <6DFACB27CA8A4A22898BC81E55C4FD36@multiplay.co.uk>
References:  <5DB6E7C798E44D33A05673F4B773405E@multiplay.co.uk><AANLkTikNhsj5myhQCoPaNytUbpHtox1vg9AZm1N-OcMO@mail.gmail.com><4C85E91E.1010602@icyb.net.ua><4C873914.40404@freebsd.org><20100908084855.GF2465@deviant.kiev.zoral.com.ua><4C874F00.3050605@freebsd.org><A6D7E134B24F42E395C30A375A6B50AF@multiplay.co.uk><4C8D087B.5040404@freebsd.org><03537796FAB54E02959E2D64FC83004F@multiplay.co.uk><4C8D280F.3040803@freebsd.org><3FBF66BF11AA4CBBA6124CA435A4A31B@multiplay.co.uk><4C8E4212.30000@freebsd.org> <B98EBECBD399417CA5390C20627384B1@multiplay.co.uk> <D79F15FEB5794315BD8668E40B414BF0@multiplay.co.uk> <4C90B4C8.90203@freebsd.org> <6DFACB27CA8A4A22898BC81E55C4FD36@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
on 15/09/2010 16:42 Steven Hartland said the following:
> ----- Original Message ----- From: "Andriy Gapon" <avg@freebsd.org>
>> on 15/09/2010 13:32 Steven Hartland said the following:
>>> === conclusion ===
>>> The interaction of zfs and sendfile is causing large amounts of memory
>>> to end up in the inactive pool and only the use of a hard min arc limit is
>>> ensuring that zfs forces the vm to release said memory so that it can be
>>> used by zfs arc.
>>
>> Memory ends up as inactive because of how sendfile works.  It first pulls data
>> into a page cache as active pages.  After pages are not used for a while, they
>> become inactive.  Pagedaemon can further recycle inactive pages, but only if
>> there is any shortage.  In your situation there is no shortage, so pages just
>> stay there, but are ready to be reclaimed (or re-activated) at any moment.
>> They are not a waste!  Just a form of a cache.
> 
> That doesnt seem to explain why without setting a min arc cache the io to disk
> went nuts even though only a few files where being requested.
> 
> This however was prior to the upgrade to stable and all patches so I think I need
> remove the configured min for arc from loader and retest with the current code
> base to confirm this is still an issue.

Right, I described behavior that you should see after the patches are applied.
Before patches it's too easy to drive ARC size into the ground.

>> If ARC size doesn't grow in that condition, then it means that ZFS simply
>> doesn't need it to.
> 
> So what your saying is that even with zero arc there should be no IO required
> as it should come direct from inactive pages? Another reason to retest with no
> hard coded arc settings.

No, I am not saying that.

>> General problem of double-caching with ZFS still remains and will remain and
>> nobody promised to fix that.
>> I.e. with sendfile (or mmap) you will end up with two copies of data, one in
>> page cache and the other in ARC.  That happens on Solaris too, no magic.
> 
> Obviously this is quite an issue as a 1GB source file will require 2GB of memory
> to stream hence totally outweighing any benefit of the zero copy sendfile offers?

I can't quite compare oranges to apples or speed to size, so that's up for you to
decide in your particular situation.

>> The things I am trying to fix are:
>> 1. Interaction between ARC and the rest of VM during page shortage; you don't
>> seem to have much of that, so you don't see it.  Besides, your range for ARC
>> size is quite narrow and your workload is so peculiar that your setup is not the
>> best one for testing this.
> 
> Indeed we have no other memory pressures, but holding two copies of the data is
> an issue. This doesn't seem to be the case in ufs so where's the difference?

UFS doesn't have its own dedicate private cache like ARC.
It uses buffer cache system which means unified cache.

>> 2. Copying of data from ARC to page cache each time the same data is served by
>> sendfile.  You won't see much changes without monitoring ARC hits as Wiktor has
>> suggested.  In bad case there would be many hits because the same data is
>> constantly copied from ARC to page cache (and that simply kills any benefit
>> sendfile may have).  In good case there would be much less hits, because data is
>> not copied, but is served directly from page cache.
> 
> Indeed. Where would this need to be addressed as ufs doesn't suffer from this?

In ZFS.  But I don't think that this is going to happen any time soon if at all.
Authors of ZFS specifically chose to use a dedicated cache, which is ARC.
Talk to them, or don't use ZFS, or get used to it.
ARC has a price, but it supposedly has benefits too.
Changing ZFS to use buffer cache is a lot of work and effectively means not using
ARC, IMO.

>>> The source data, xls's and exported graphs can be found here:-
>>> http://www.multiplaygameservers.com/dropzone/zfs-sendfile-results.zip
>>
>> So, what problem, performance or otherwise, do you perceive with your system's
>> behavior?  Because I don't see any.
> 
> The initial problem was that with a default config, ie no hard coded min or max on
> arc
> the machine very quickly becomes seriously IO bottlenecked which simply doesn't
> happen on ufs.

Well, I thought that you hurried when you applied the patches and changed the
settings at the same time.  This made it impossible for you to judge properly what
patches do and don't do for you.

> Now we have a very simple setup so we can make sensible values for min / max but
> it still means that for every file being sent when sendfile is enabled:
> 1. There are two copies in memory which is still going to mean that only half the
> amount files can be successfully cached and served without resorting to disk IO.

Can't really say, depends on the size of the files.
Though, it's approximately a half of what could have fit in memory with e.g. UFS, yes.

> 2. sendfile isn't achieving what it states it should be i.e. a zero-copy. Does
> this explain
> the other odd behaviour we noticed, high CPU usage from nginx?

sendfile should achieve zero copy with all the patches applied once both copies of
data are settled in memory.  If you have insufficient memory to hold the workset,
then that's a different issue of moving competing data in and out of memory. And
that may explain the CPU load, but it's just a speculation.

>> To summarize:
>> 1. With sendfile enabled you will have two copies of actively served data in
>> RAM, but perhaps slightly faster performance, because of avoiding another copy
>> to mbuf in sendfile(2).
>> 2. With sendfile disabled, you will have one copy of actively served data in RAM
>> (in ARC), but perhaps slightly slower performance because of a need to make a
>> copy to mbuf.
>>
>> Which would serve you better depends on size of your hot data vs RAM size, and
>> on actual benefit from avoiding the copying to mbuf.  I have never measured the
>> latter, so I don't have any real numbers.
>> From your graphs it seems that your hot data (multiplied by two) is larger than
>> what your RAM can accommodate, so you should benefit from disabling sendfile.
> 
> This is what I thought, memory pressure has been eased from the initial problem point
> due to a memory increase from 4 - 7GB in the machine in question, but it seems at
> this point both 1 and 2 are far from ideal situations both having fairly serious
> side effects
> on memory use / bandwidth and possibly CPU, especially as hot data vs. clients is
> never
> going to be static ratio and hence both are going to fall down at some point :(
> 
> I suspect this is going to be effecting quite a few users with nginx and others
> that use
> sendfile for high performance file transmission becoming more and more popular as is
> zfs.
> 
> So the question is how do we remove these unexpected bottlenecks and make zfs as
> efficient as ufs when sendfile is used?

At present I don't see any other way but brute force - throw even more RAM at the
problem.

Perhaps, a miracle would happen and someone would post patches that radically
change ZFS behavior with respect to caches.  But I don't expect it
(pessimist/realist).

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C90D3A1.7030008>