Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jan 2020 09:00:03 -1000 (HST)
From:      Jeff Roberson <jroberson@jroberson.net>
To:        Peter Jeremy <peter@rulingia.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Minimum memory for ZFS (was Re: svn commit: r356758 - in head/usr.sbin/bsdinstall: . scripts)
Message-ID:  <alpine.BSF.2.21.9999.2001280851370.1198@desktop>
In-Reply-To: <20200128064206.GC18006@server.rulingia.com>
References:  <202001230207.00N274xO042659@mail.karels.net> <alpine.BSF.2.21.9999.2001261050070.1198@desktop> <20200128064206.GC18006@server.rulingia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 28 Jan 2020, Peter Jeremy wrote:

> On 2020-Jan-26 10:58:15 -1000, Jeff Roberson <jroberson@jroberson.net> wrote:
>> My proposal is this, limit ARC to some reasonable fraction of memory, say
>> 1/8th, and then do the following:
>>
>> On expiration from arc place the pages in a vm object associated with the
>> device.  The VM is now free to keep them or re-use them for user memory.
>>
>> On miss from arc check the page cache and take the pages back if they
>> exist.
>>
>> On invalidation you need to invalidate the page cache.
>
> ZFS already has a mechanism for the VM system to request that ARC be
> shrunk.  This was designed around the Solaris VM system and isn't a perfect
> match with the FreeBSD VM system.  There has been a lot of work, within
> FreeBSD, over the years to improve the way this behaves but it's obviously
> not perfect.  One point of confusion is that FreeBSD and Solaris use
> identical terms to mean different things within their VM systems.

It wasn't a very good match for the Solaris VM system either.  According 
to various accounts of people involved in the project they had intended to 
integrate with the page cache but ran out of time.  It was meant for 
fileservers and it made a lot of compromises for general purpose workloads 
that have been fixed over time.  For example fsync requiring the zil was 
not in the original design.  This is one of those compromises.  mmap and 
sendfile are others.  There are ways to fix that as well FWIW but it 
requires some changes to our fault/object apis.

>
>> ARC already allows spilling to SSD.  I don't know the particulars of the
>> interface but we're essentially spilling to memory that can be reclaimed
>> by the page daemon as necessary.
>
> This is L2ARC.  ZFS expects that L2ARC refers to a fixed amount of space
> that ZFS has control over.  I'm unaware of any mechanism for an external
> system to reclaim L2ARC space - it would require the external system to
> request ZFS (via an, AFAIK, non-existent interface) to free L2ARC objects.
>
> ZFS only implements a single level of L2ARC so re-using L2ARC for this new
> caching mechanism would also prevent the use of traditional SSD-based L2ARC
> on FreeBSD.

My point was only that it has hooks for moving things out of arc already. 
Not that this is a perfect fit.

>
> Since ZFS already has a mechanism for an external system to request that
> ZFS release memory, I believe effort would be better expended in getting
> FreeBSD to better exchange memory pressure data with ZFS via that existing
> mechanism, rather than inventing a new mechanism that is intended to
> achieve much the same result, whilst effectively removing a ZFS feature.

It is very challenging to implement a sane global memory policy when you 
have a number of uncoordinated local caches making decisions.  Many people 
have invested effort into making this system work better but still we need 
tuning to run properly at different memory sizes and workloads.  I find 
this solution to be an architecturally weak compromise.  If the trivially 
small amount of performance afforded to you by additional ARC is important 
it is much easier to opt in with a sysctl while everyone else enjoys a 
more self-tuning and dynamic system.

Thanks,
Jeff

>
> -- 
> Peter Jeremy
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.21.9999.2001280851370.1198>