Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Feb 2013 14:41:32 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        net@FreeBSD.org
Subject:   Re: m_get2() name
Message-ID:  <510BC60C.4020500@freebsd.org>
In-Reply-To: <20130201131555.GJ91075@glebius.int.ru>
References:  <20130201120414.GG91075@FreeBSD.org> <510BBD66.4080903@freebsd.org> <20130201131555.GJ91075@glebius.int.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01.02.2013 14:15, Gleb Smirnoff wrote:
> On Fri, Feb 01, 2013 at 02:04:38PM +0100, Andre Oppermann wrote:
> A> >    The m_get2() function allocates a single mbuf with enough space
> A> > to hold specified amount of data. It can return either a single mbuf,
> A> > an mbuf with a standard cluster, page size cluster, or jumbo cluster.
> A>
> A> While m_get2() is a good function, I'm not too happy with it returning
> A> jumbo clusters.  The size of jumbo cluster is not well specified and
> A> can be anything above 2K, from 4K to 16K or more.  The network stack
> A> hacker can't rely on any particular size above PAGE_SIZE to be present.
> A>
> A> So I recommend to make PAGE_SIZE the largest cluster size to be available
> A> in a single mbuf allocator.  PAGE_SIZE is a known quantity and plays well
> A> with the allocator.  Anything larger than PAGE_SIZE causes contig_malloc
> A> to be used as the requirement is physically contiguous pages for those
> A> clusters.  After some uptime this may become more difficult to allocate
> A> and can lead to premature allocation failures while still plenty of
> A> memory would be around.  The allocation overhead for such jumbo zones
> A> is higher in UMA than for PAGE_SIZE clusters.
>
> I am against API that forbids allocating jumbo clusters. The kernel has them,
> albeit their disadvantages. And API should offer them to drivers and modules.
> If some module doesn't want to get a jumbo clustered mbuf from m_get2(), then
> it should not request above PAGE_SIZE from m_get2() and that's all.

I'm not saying that it should be forbidden to allocate jumbo clusters.
It's just that they are no guaranteed to be available at all or in
particular sizes except for PAGE_SIZE.  The risk is that a developer
uses the convenient API m_get2() for a 7K buffer and expects it work
everywhere which is not the case.  Something that can't be relied on
should be automatic but explicit with the surrounding run or compile
time tests.

I'd also like to note that we have mbuf chains for the reason that
large contiguous allocations are difficult and not as performant as
PAGE_SIZEd and smaller ones.  If your code needs a > PAGE_SIZE jumbo
cluster you have done something wrong (to paraphrase PHK).

The only reason we actually got > PAGE_SIZE jumbo clusters is old
NIC which couldn't do inbound S/G DMA.  These are long gone and every
jumbo frame capable NIC can do S/G DMA into, for example, PAGE_SIZE
mbuf clusters.

For casual readers a jumbo cluster > PAGE_SIZE is contiguous not only
in virtual kernel address space but also physically in actual RAM.  While
both may cause some allocation difficulties, its mostly physical part in
pmap which eventually suffers from fragmentation.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?510BC60C.4020500>