Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Mar 2013 16:49:06 -0500
From:      Alan Cox <alan.l.cox@gmail.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        arch@freebsd.org, David Wolfskill <david@catwhisker.org>
Subject:   Re: VM_BCACHE_SIZE_MAX on i386
Message-ID:  <CAJUyCcMXysO95CqxtWnbkzU5nJx757ktijQi6D%2BUEU5BE=z91g@mail.gmail.com>
In-Reply-To: <20130323211001.GN3794@kib.kiev.ua>
References:  <20130323211001.GN3794@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 23, 2013 at 4:10 PM, Konstantin Belousov <kostikbel@gmail.com>wrote:

> The unmapped I/O work allows avoiding the map of the vnode pages into
> the kernel memory for the UFS mounts, if underlying geoms and disk
> drivers accept unmapped BIOs.  Converting all geom classes and
> drivers, despite not very hard, is quite big task, which requires a
> lot of validation on the unusual configurations and rare hardware.  I
> decided to provide the transient remapping for the classes which are
> not yet converted, which allowed to put the work into HEAD much
> earlier, if at all.
>
> When unmapped BIO is passed through the geom stack and next geom is
> not marked as accepting unmapped BIO, the KVA space in the so called
> transient map is allocated and pages are mapped there.  On the
> architectures with ample KVA creating the transient map is not an
> issue, but it is very delicate on the architectures with the limited
> KVA, i.e. mostly 32bit architectures.
>
> To not distrurb the KVA layout and current balance, I split the space
> previously allocated to the buffer map, into 90% which are still used
> by the buffer map, and the rest 10%, dedicated to the transient
> mapping.  The split rationale is that typical load have 9/1 split for
> the user data/metadata buffers, and almost all user data buffers are
> unmapped.
>
> More precisely, the transient map is sized to 10% of the maximum
> _theoretical_ allowed buffer map size on the arch. Real buffer map is
> usually smaller, sized proportionally to the available RAM. The
> details of the allocation are in the
> vfs_bio.c:kern_vfs_bio_buffer_alloc().  The function uses maxbcache
> tunable, initialized from VM_BCACHE_SIZE_MAX by default.
>
> But, on i386 !PAE, VM_BCACHE_SIZE_MAX is bigger then the maximally
> sized buffer cache, on the 4GB RAM machine. The max buffer cache map
> size is around 110MB, while VM_BCACHE_SIZE_MAX is 200MB. This causes
> the bio_transient_map oversizing, eating additional 90MB of precious
> KVA on i386.
>
>

The additional KVA that we had to reserve for the vm_page radix tree nodes
already got me thinking about VM_BCACHE_SIZE_MAX a couple weeks ago.  With
the extra KVA pressure that is inherent to PAE, e.g., a larger vm_page
struct, we really can't afford to allow the buffer map KVA allocation to
grow much beyond what it would be for a 4GB machine anyway.  Moreover, your
work makes the size of the buffer map less important, because it will see
decreasing use as drivers are converted to allow unmapped I/O.  So, I would
encourage you to simply use the same cap based on a 4 GB machine for both
PAE and !PAE.



> By itself this +90MB KVA use is not critical, but it starts
> conflicting with other KVA hogs, like nvidia blob, which seemingly
> tries to remap the whole aperture (256+ MB) into the KVA. The issue
> was reported by dwh, and appeared to be quite misterious, since his
> machine has no useful way to report panics from failed X.
>
> The resolution I propose is to change the VM_BCACHE_SIZE_MAX on i386
> !PAE case, to make it equal to the exact max size of the buffer cache.
> Note that maxbcache can be tuned from the loader prompt, so the effect
> of the change would be only on the i386 machines with tuned buffer
> cache.
>
> Also, the patch doubles the size of the transient map to 1/5 of the
> max buffer cache. This gives 180 parallel remapped i/os in flight,
> since I consider the re-caclulated 90 i/os too small even for i386.
>
> The patch was tested by dwh, please comment. I intend to commit it in
> several days.
>
> http://people.freebsd.org/~kib/misc/i386_maxbcache.1.patch
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJUyCcMXysO95CqxtWnbkzU5nJx757ktijQi6D%2BUEU5BE=z91g>