Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Aug 2014 11:54:42 -0500
From:      Alan Cox <alc@rice.edu>
To:        Steven Hartland <smh@freebsd.org>, Peter Wemm <peter@wemm.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Dmitry Morozovsky <marck@rinet.ru>, "Matthew D. Fuller" <fullermd@over-yonder.net>
Subject:   Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm
Message-ID:  <5400B052.6030103@rice.edu>
In-Reply-To: <4A4B2C2D36064FD9840E3603D39E58E0@multiplay.co.uk>
References:  <201408281950.s7SJo90I047213@svn.freebsd.org> <20140828211508.GK46031@over-yonder.net> <53FFAD79.7070106@rice.edu> <1617817.cOUOX4x8n2@overcee.wemm.org> <4A4B2C2D36064FD9840E3603D39E58E0@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On 08/29/2014 03:32, Steven Hartland wrote:
>> On Thursday 28 August 2014 17:30:17 Alan Cox wrote:
>> > On 08/28/2014 16:15, Matthew D. Fuller wrote:
>> > > On Thu, Aug 28, 2014 at 10:11:39PM +0100 I heard the voice of
>> > >
>> > > Steven Hartland, and lo! it spake thus:
>> > >> Its very likely applicable to stable/9 although I've never used 9
>> > >> myself, we jumped from 9 direct to 10.
>> > >
>> > > This is actually hitting two different issues from the two bugs:
>> > >
>> > > - 191510 is about "ARC isn't greedy enough" on huge-memory > >
>> machines,
>> > >
>> > >   and from the osreldate that bug was filed on 9.2, so presumably
>> > > is
>> > >   applicable.
>> > >
>> > > - 187594 is about "ARC is too greedy" (probably mostly on > >
>> not-so-huge
>> > >
>> > >   machines) and starves/drives the rest of the system into swap.
>> > > That
>> > >   I believe came about as a result of some unrelated change in the
>> > >   10.x stream that upset the previous balance between ARC and the
>> > > rest
>> > >   of the VM, so isn't a problem on 9.x.
>> >
>> > 10.0 had a bug in the page daemon that was fixed in 10-STABLE about
>> > three months ago (r265945).  The ARC was not the only thing
>> affected > by
>> this bug.
>>
>> I'm concerned about potential unintended consequences of this change.
>>
>> Before, arc reclaim was driven by vm_paging_needed(), which was:
>> vm_paging_needed(void)
>> {
>>     return (vm_cnt.v_free_count + vm_cnt.v_cache_count <
>>         vm_pageout_wakeup_thresh);
>> }
>>
>> Now it's ignoring the v_cache_count and looking exclusively at
>> v_free_count.
>> "cache" pages are free pages that just happen to have known contents.
>> If I
>> read this change right, zfs arc will now discard checksummed cache
>> pages to
>> make room for non-checksummed pages:
>
> That test is still there so if it needs to it will still trigger.
>
> However that often a lower level as vm_pageout_wakeup_thresh is only 110%
> of min free, where as zfs_arc_free_target is based of target free
> which is
> 4 * (min free + reserved).
>
>> +       if (kmem_free_count() < zfs_arc_free_target) {
>> +               return (1);
>> +       }
>> ...
>> +kmem_free_count(void)
>> +{
>> +       return (vm_cnt.v_free_count);
>> +}
>>
>> This seems like a pretty substantial behavior change.  I'm concerned
>> that it
>> doesn't appear to count all the forms of "free" pages.
>>
>> I haven't seen the problems with the over-aggressive ARC since the
>> page daemon
>> bug was fixed.  It's been working fine under pretty abusive loads in
>> the freebsd
>> cluster after that fix.
>
> Others have also confirmed that even with r265945 they can still trigger
> performance issue.
>
> In addition without it we still have loads of RAM sat their unused, in my
> particular experience we have 40GB of 192GB sitting their unused and that
> was with a stable build from last weekend.
>


The Solaris code only imposed this limit on 32-bit machines where the
available kernel virtual address space may be much less than the
available physical memory.  Previously, FreeBSD imposed this limit on
both 32-bit and 64-bit machines.  Now, it imposes it on neither.  Why
continue to do this differently from Solaris?


> With the patch we confirmed that both RAM usage and performance for those
> seeing that issue are resolved, with no reported regressions.
>
>> (I should know better than to fire a reply off before full fact
>> checking, but
>> this commit worries me..)
>
> Not a problem, its great to know people pay attention to changes, and
> raise
> their concerns. Always better to have a discussion about potential issues
> than to wait for a problem to occur.
>
> Hopefully the above gives you some piece of mind, but if you still
> have any
> concerns I'm all ears.
>


You didn't really address Peter's initial technical issue.  Peter
correctly observed that cache pages are just another flavor of free
pages.  Whenever the VM system is checking the number of free pages
against any of the thresholds, it always uses the sum of v_cache_count
and v_free_count.  So, to anyone familiar with the VM system, like
Peter, what you've done, which is to derive a threshold from
v_free_target but only compare v_free_count to that threshold, looks
highly suspect.

That said, I can easily believe that your patch works better than the
existing code, because it is closer in spirit to my interpretation of
what the Solaris code does.  Specifically, I believe that the Solaris
code starts trimming the ARC before the Solaris page daemon starts
writing dirty pages to secondary storage.  Now, you've made FreeBSD do
the same.  However, you've expressed it in a way that looks broken.

To wrap up, I think that you can easily write this in a way that
simultaneously behaves like Solaris and doesn't look wrong to a VM expert.


> Out of interest would it be possible to update machines in the cluster to
> see how their workload reacts to the change?
>
>    Regards
>    Steve
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5400B052.6030103>