Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jan 2014 17:34:34 -0800
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Adrian Chadd <adrian@freebsd.org>, Garrett Wollman <wollman@csail.mit.edu>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: Big physically contiguous mbuf clusters
Message-ID:  <20140130013434.GP93141@funkthat.com>
In-Reply-To: <20140129231121.GA18434@ox>
References:  <21225.20047.947384.390241@khavrinen.csail.mit.edu> <CAJ-VmomC5Ge3JwfUsgMrJ_rGqiYxfxR4wWzn5A-KAu7HBsueMw@mail.gmail.com> <20140129231121.GA18434@ox>

next in thread | previous in thread | raw e-mail | index | archive | help
Navdeep Parhar wrote this message on Wed, Jan 29, 2014 at 15:11 -0800:
> On Wed, Jan 29, 2014 at 02:21:21PM -0800, Adrian Chadd wrote:
> > Hi,
> > 
> > On 29 January 2014 10:54, Garrett Wollman <wollman@csail.mit.edu> wrote:
> > > Resolved: that mbuf clusters longer than one page ought not be
> > > supported.  There is too much physical-memory fragmentation for them
> > > to be of use on a moderately active server.  9k mbufs are especially
> > > bad, since in the fragmented case they waste 3k per allocation.
> > 
> > I've been wondering whether it'd be feasible to teach the physical
> > memory allocator about >page sized allocations and to create zones of
> > slightly more physically contiguous memory.
> 
> I think this would be very useful.  For example, a zone_jumbo32 would
> hit a sweet spot -- enough to fit 3 jumbo frames and some loose change
> for metadata.  I'd like to see us improve our allocators and VM system

Actually, that is what currently happens...  I just verified this on
-current...

http://fxr.watson.org/fxr/source/vm/uma_core.c#L880

is where the allocation happens, notice the uk_ppera, and kgdb says:
print zone_jumbo9[0].uz_kegs.lh_first[0].kl_keg[0].uk_ppera
$7 = 3

> to work better with larger contiguous allocations, rather than
> deprecating the larger zones.  It seems backwards to push towards
> smaller allocation units when installed physical memory in a typical
> system continues to rise.
> 
> Allocating 3 x 4K instead of 1 x 9K for a jumbo means 3x the number of
> vtophys translations, 3x the phys_addr/len traffic on the PCIe bus

I don't think that this will be an issue.. If we support a 9k jumbo
that is not physically contiguous (easy on main memory), it's likely
that the table we use to fetch the first physical page will likely have
the next two pages in it, so I doubt there will be that significant
performance penalty, yes, we'll loop a few more times, but main memory
accesses is more the speed limiter in these situations...

> (scatter list has to be fed to the chip and now it's 3x what it has to
> be), 3x the number of "wrapper" mbuf allocations (one for each 4K
> cluster) which will then be stitched together to form a frame, etc. etc.

And what is that in percentage of overall traffic?  .4% (assuming 16 bytes
per 4k page)...  If your PCIe bus is saturating and you need that extra
.4% traffic, then you have a serious issue w/ your bus layout...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140130013434.GP93141>