From owner-freebsd-net  Tue Jul  2  0:10:56 2002
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id E914E37B400
	for <net@freebsd.org>; Tue,  2 Jul 2002 00:10:50 -0700 (PDT)
Received: from iguana.icir.org (iguana.icir.org [192.150.187.36])
	by mx1.FreeBSD.org (Postfix) with ESMTP id A4E8743E0A
	for <net@freebsd.org>; Tue,  2 Jul 2002 00:10:50 -0700 (PDT)
	(envelope-from rizzo@iguana.icir.org)
Received: (from rizzo@localhost)
	by iguana.icir.org (8.11.6/8.11.3) id g627AoZ02468;
	Tue, 2 Jul 2002 00:10:50 -0700 (PDT)
	(envelope-from rizzo)
Date: Tue, 2 Jul 2002 00:10:50 -0700
From: Luigi Rizzo <rizzo@icir.org>
To: net@freebsd.org
Subject: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?)
Message-ID: <20020702001050.B2250@iguana.icir.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
Sender: owner-freebsd-net@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-net.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-net>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-net>
X-Loop: FreeBSD.org

[Bcc to -current as relevant there]

As a followup to my question below, i did some simple experiment
on a -current box acting as a router, using two "em" cards and
DEVICE_POLLING (just for stability).

The whole of the code is basically below -- the receiving
side tries to grab the mbuf+cluster from the free pool first,
and falls back to the standard method on failure; the sending
side tries to attach the freed buffer to the free pool
if it is the case.

There is a simplification here with respect to what could
go into an m_freem()/mgethdr_cluster() pair because the driver
is already holding a lock, on Giant in this case, so you do not need
further locking. Still, to come to the data points:

				CURRENT		STABLE (*)

fastfwd=1, pool_max=0		276kpps		365 kpps
fastfwd=1, pool_max=50		355kpps		383 kpps

fastfwd=0, pool_max=0		195 pps		142 kpps
fastfwd=0, pool_max=50		227 kpps	146 kpps

(*) This version of STABLE that I am using for comparison has some
proprietary optimizations which make it a bit faster than normal.
However it still uses the old ipfw code, which is invoked when
fastfwd=0, and is significantly slower than the new one.

Now this really seems to call for adding this interface into the
mbuf subsystem. I believe we have to find a name for the allocator
(the deallocator might well go into m_freem(), depending on how we
implement the locking) and whether it makes sense to lock
mgethdr_cluster() under per-cpu locks or under Giant, or even let
the caller make sure that it holds the proper lock before trying
to invoke the procedure (as i expect the "producers" or "consumers"
of these pairs to be located in the network stack, chances are that
they already hold a lock on Giant).


	cheers
	luigi


The code:

	struct mbuf *em_pool;

	static int      em_pool_max = 50;
	SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW,
		&em_pool_max,0,"max size of mbuf pool");
	static int em_pool_now;
	SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD,
		&em_pool_now,0,"Current size of mbuf pool");

	... in em_get_buf() ..

		if (em_pool) {
                        mp = em_pool;
                        em_pool = mp->m_nextpkt;
                        em_pool_now--;
                        goto have_it;
                }

	... in em_clean_transmit_interrupts() ...
		if ((m = tx_buffer->m_head)) {
			if (em_pool_now < em_pool_max &&
			    m->m_next == NULL &&
			    m->m_flags & M_EXT &&
			    M_WRITABLE(m) ) {
				m->m_nextpkt = em_pool;
				em_pool = m;
				em_pool_now++;
			} else
				m_freem(m);
			tx_buffer->m_head = NULL;
		}


On Sat, Jun 29, 2002 at 02:53:03PM -0700, Luigi Rizzo wrote:
> Hi,
> during some experiments i was doing recently, i noticed that there
> is a significant improvement in the forwarding speed (especially
> at very high speeds) if we keep a small pool of mbuf+cluster
> ready for use. This is because most network drivers do something
> like this
> 
>                 MGETHDR(m_new, M_DONTWAIT, MT_DATA);
>                 if (m_new == NULL)
>                         return(ENOBUFS);
> 
>                 MCLGET(m_new, M_DONTWAIT);
>                 if (!(m_new->m_flags & M_EXT)) {
>                         m_freem(m_new);
>                         return(ENOBUFS);
>                 }
> 
> when replenishing the receive buffers, and both macros are quite
> long even if there are available blocks in the free lists. We can
> store buffers of this form when/if they are released with some code
> like this:
> 
> 	if (my_pool_count < my_pool_max && m->m_next == NULL &&
> 			(m->m_flags & M_EXT) && M_EXT_WRITABLE(m) ) {
> 		m->m_nextpkt = my_pool;
> 		m->m_data = ->m_ext.ext_buf;
> 		m->m_len = m->m_pkthdr.len = MCLBYTES;
> 		my_pool = m;
> 		my_pool_now++;
> 	} else {
> 		... rest of m_freem() ...
> 	}
> 
> and save a lot of overhead (we just need to reset m_data and
> m_len and m_pkthdr.len) when someone wants to allocate them.
> 
> Is there interest in committing some code like this to
> mbuf.h and maybe uipc_mbuf*.c ?
> 
> 	cheers
> 	luigi
> 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message