Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Nov 2002 11:31:39 -0500 (EST)
From:      Robert Watson <rwatson@freebsd.org>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        Luigi Rizzo <rizzo@icir.org>, current@freebsd.org
Subject:   Re: mbuf header bloat ?
Message-ID:  <Pine.NEB.3.96L.1021125111737.36232C-100000@fledge.watson.org>
In-Reply-To: <15840.8629.324788.887872@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 23 Nov 2002, Andrew Gallatin wrote:

> I propose that we make struct label portion of the pkthdr compile-time
> conditional on MAC.  The assumption is that you will move the MAC label
> to an m_tag sometime after 5.0-RELEASE. 

This weekend I spent about six hours looking at what it would take to move
MAC label data into m_tags.  While in theory it is a workable idea, it
turns out our m_tag implementation is fairly far from being ready to
handle something like this.  I ran into the following immediate problems:

(1) When packet headers are copied using m_copy_pkthdr(), different
    consumers have different expectations for what the resulting semantics
    are for m_tag data -- some want it duplicated, others want it moved.
    In practice, it is only ever moved, so consumers that expect
    duplication are in for a surprise.  We need to re-implement the packet
    header copying code so that it can generate a failure (because it
    involves allocation), and separate the duplicate and move abstractions
    to get clean semantics.  I exchanged some e-mail with Sam Leffler on
    the topic, and apparently OpenBSD has already made these changes, or
    similar ones, and we should do the same for 5.1.

(2) m_tag's don't have a notion that the data carried in a tag is
    multi-dimmensional and may require special destructor behavior.  While
    it does centralize copying and free'ing of data, it handles this
    purely with bcopy, malloc, and free, which is not appropriate for use
    with MAC labels, since they may contain a variety of per-policy data
    that may require special handling (reference count management, etc).
    I tried putting tag-specific release/free/... code in the m_tag
    central routines, and it looks like that would work, although it
    eventually would lead to a lot of junk in the m_tag code.  We might
    want to consider m_tag entry free/copy pointers of some sort, but I'm
    not sure if we want to go there.  Adding the MAC stuff to the
    m_tag_{free,copy,...} calls won't break the ABI, whereas adding free
    and copy pointers to the tags themselves would.

(3) Not all code generating packets properly initializes m_tag field.  The
    one example I've come across so far is the use of m_getcl() to grab
    mbufs with an attached cluster -- it never initializes the SLIST
    properly, as far as I can tell.  Right now that's used in device
    drivers, and also in the BPF packet generation code.  If the header is
    zero'd, this may be OK due to an implicit proper initialization, but
    this is concerning.  We need to do more work to normalize packet
    handling.

(4) Code still exists that improperly hand-copies mbuf packet header data.
    if_loop.c contains some particular bogus code that also triggers a
    panic in the MAC code for the same reason.  If m_tag data ever passes
    through if_loop and hits the re-alignment case introduced by KAME, the
    system will panic when the tag data is free'd.  This code all needs to
    be normalized, and proper semantics must be enforced. 

> This will immediately reduce the size of mbufs for the vast majority of
> users, and will prevent a 4.1.1 like flag-day for 3rd party network
> driver vendors.  The only downside is that the few MAC users will not be
> able to use 3rd party binary network drivers until the MAC label is put
> into an m_tag.  This seems fair, as the only people inconvienced are the
> people who want the labels and they are motivated to move them to an
> m_tag.  But that's easy for me to say, since I don't run MAC, and I may
> be missing something big. 

In the past I have looked at adding conditionally-defined components to
struct mbuf and other key kernel data structures.  While the condition of
the tree is improving from this perspective due to better isolation of
user and kernel data structures, the result is still incredibly messy,
especially if you key the conditionally defined sections on a kernel
option.  mbuf.h is included in a number of userland applications -- some
expected, such as the ipfilter test framework, but others less expected --
such as BIND.  I'm very wary of the notion of adding conditionally defined
portions of struct mbuf on this (and other) bases.  I'll take a look at
whether many of the obvious foot-shooting scenarios still exist since I
last tried it.  Moving to m_tag looks like a reasonable long-term
strategy, but until the m_tag code is substantially more mature, it isn't
realistic.  Otherwise, I might have attempted to push through a change to
it now before RC1.

BTW, do you have any recent large-scale measurements of packet size
distribution?  In local tests and measurements, the additional 20 bytes on
i386 didn't bump the remaining mbuf data space sufficiently low to
substantially change the behavior of the stack.  However, I haven't done
measurements against the 64-bit variation.  In practice, a number of
network interfaces now seem to use clustered mbufs and not attempt to use
the in-mbuf storage space...  All my packet distribution measurements come
from a typical ISP environment, but may not match what is seen in
large-scale backbone environments.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Network Associates Laboratories



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1021125111737.36232C-100000>