Date: Tue, 2 Jul 2002 08:04:23 -0400 From: Bosko Milekic <bmilekic@unixdaemons.com> To: Luigi Rizzo <rizzo@icir.org> Cc: net@FreeBSD.ORG Subject: Re: Mbuf allocator performance (was Should we keep a cache of mbuf+cluster ready for use ?) Message-ID: <20020702080423.A69693@unixdaemons.com> In-Reply-To: <20020702001050.B2250@iguana.icir.org>; from rizzo@icir.org on Tue, Jul 02, 2002 at 12:10:50AM -0700 References: <20020702001050.B2250@iguana.icir.org>
next in thread | previous in thread | raw e-mail | index | archive | help
There are several problems with your "simple" code that make it simpler than what's in -CURRENT and in -STABLE and therefore yield [obviously] seemingly better times. 1) sleeping in the case of a system exhausted of mbufs and clusters is now broken because code that frees to this "combined" list will not generate wake ups; even if you build this functionality in, you would have to code the wakeup for a "special" case where one obtains both a cluster and an mbuf; it'll get complicated. 2) no per-CPU caches, you bring contention (in -CURRENT) from per-CPU back to global, which you probably don't care about now, but which will matter when Giant goes completely. 3) you violate write/read producer/consumer model. In -CURRENT, we try to not have the code doing the freeing write to any part of the mbuf, thus hopefully avoiding cache invalidation from occuring during a free that happens from a thread that doesn't usually write to the mbuf at all. I think that because of (1) and (2), especially, and because the kernel in -CURRENT is the way it is right now, you're getting better performance numbers. Try exhausting the system and seeing how far you get. I suspect you'll eventually just hang (because you'll allocate a bunch of stuff violating the real interface and then won't be able to wakeup and Do The Right Thing). While, as I said before, I agree that we should have an interface that allocates both an mbuf and a cluster, I don't think that this is the right approach. I believe that the correct approach would be to couple the mbuf and cluster allocation under a single per-CPU lock/unlock operation, which isn't too difficult to do. On Tue, Jul 02, 2002 at 12:10:50AM -0700, Luigi Rizzo wrote: > [Bcc to -current as relevant there] > > As a followup to my question below, i did some simple experiment > on a -current box acting as a router, using two "em" cards and > DEVICE_POLLING (just for stability). > > The whole of the code is basically below -- the receiving > side tries to grab the mbuf+cluster from the free pool first, > and falls back to the standard method on failure; the sending > side tries to attach the freed buffer to the free pool > if it is the case. > > There is a simplification here with respect to what could > go into an m_freem()/mgethdr_cluster() pair because the driver > is already holding a lock, on Giant in this case, so you do not need > further locking. Still, to come to the data points: > > CURRENT STABLE (*) > > fastfwd=1, pool_max=0 276kpps 365 kpps > fastfwd=1, pool_max=50 355kpps 383 kpps > > fastfwd=0, pool_max=0 195 pps 142 kpps > fastfwd=0, pool_max=50 227 kpps 146 kpps > > (*) This version of STABLE that I am using for comparison has some > proprietary optimizations which make it a bit faster than normal. > However it still uses the old ipfw code, which is invoked when > fastfwd=0, and is significantly slower than the new one. > > Now this really seems to call for adding this interface into the > mbuf subsystem. I believe we have to find a name for the allocator > (the deallocator might well go into m_freem(), depending on how we > implement the locking) and whether it makes sense to lock > mgethdr_cluster() under per-cpu locks or under Giant, or even let > the caller make sure that it holds the proper lock before trying > to invoke the procedure (as i expect the "producers" or "consumers" > of these pairs to be located in the network stack, chances are that > they already hold a lock on Giant). > > > cheers > luigi > > > The code: > > struct mbuf *em_pool; > > static int em_pool_max = 50; > SYSCTL_INT(_hw, OID_AUTO, em_pool_max, CTLFLAG_RW, > &em_pool_max,0,"max size of mbuf pool"); > static int em_pool_now; > SYSCTL_INT(_hw, OID_AUTO, em_pool_now, CTLFLAG_RD, > &em_pool_now,0,"Current size of mbuf pool"); > > ... in em_get_buf() .. > > if (em_pool) { > mp = em_pool; > em_pool = mp->m_nextpkt; > em_pool_now--; > goto have_it; > } > > ... in em_clean_transmit_interrupts() ... > if ((m = tx_buffer->m_head)) { > if (em_pool_now < em_pool_max && > m->m_next == NULL && > m->m_flags & M_EXT && > M_WRITABLE(m) ) { > m->m_nextpkt = em_pool; > em_pool = m; > em_pool_now++; > } else > m_freem(m); > tx_buffer->m_head = NULL; > } Regards, -- Bosko Milekic bmilekic@unixdaemons.com bmilekic@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020702080423.A69693>