Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Apr 2002 20:36:31 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: performance of mbufs vs contig buffers?
Message-ID:  <15538.14223.494295.766977@grasshopper.cs.duke.edu>
In-Reply-To: <3CB21FCF.6B018811@mindspring.com>
References:  <15538.5971.620626.548508@grasshopper.cs.duke.edu> <3CB21FCF.6B018811@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help

Terry Lambert writes:
 > Andrew Gallatin wrote:
 > > After updating the firmware on our our 2 gigabit nic to allow enough
 > > scatter entries per packet to stock the 9K (jumbo frame) receive
 > > rings with cluster mubfs rather than contigmalloc'ed buffers(*), I
 > > noticed a dramatic performance decrease: netperf TCP_STREAM
 > > performance dropped from 1.6Gb/sec to 1.2Gb/sec.
 > 
 > [ ... ]
 > 
 > > Is it possible that my problems are being caused by cache misses in
 > > on cluster mbufs occuring when copying out to userspace as another
 > > packet is being DMA'ed up?  I'd thought that since the cache line size
 > > is 32 bytes, I'd be pretty much equally screwed either way.
 > 
 > [ ... ]
 > 
 > > Does anybody have any ideas why contig malloc'ed buffers are so much
 > > quicker?
 > 
 > Instrument m_pullup(), and see how much it's being called in
 > both cases.  Probably you are seeing the 2 byte misalignment
 > of the TCP payload in the the ethernet packet.

The TCP payload is aligned.  We stock the rings so that the
ethernet header is intentionally misaligned, which makes the IP
portion of the packet land aligned.  (actually, we encapsulate the
ethernet traffic behind another 16-bit header, so everything ends up
aligned without the +2/-2 stuff).

 > My other guess would be that the clusters you are dealing
 > with are non-contiguous.  This has both scatter/gather
 > implications, and cache-line implications when using them.

Please elaborate...  What sort of scatter/gather implications?
Microbenchmarks don't show much of a difference DMA'ing to
non-contigous vs. contigous pages. (over 400MB/sec in all cases).
Also, we get close to link speed DMA'ing to user space, and with page
coloring, that virtually guarantees that the pages are not physically
contigous.

Based on the UDP behaviour, I think that its cache implications.  The
bottleneck seems to be when copyout() reads the recently DMA'ed data.
The driver reads the first few dozen bytes (so as to touch up the csum
by subracting off the extra bits the DMA engines added in).  We do
hardware csum offloading, so the entire packet is not read until
copyout() is called.

 > Having thought about this problem before, I think that what
 > you probably need is to chunk the buffers up, and treat them
 > as M_EXT type mbufs (e.g. go with contigmalloc).

I really, really hate doing this for a variety of reasons.  Mainly
that the user may not expect the NIC driver is doing this & it may
take her a while to realize that adjusting NMBCLUSTERS has no effect.
Although... Hmmm..  I could use a small amount of private buffers
while I have them & then fall back to contig buffers when I run out.  

I'd still like to fully understand the problem though; sweeping it
under the rug bothers me.

 > To be able to use "generic" mbufs for this, what's really
 > needed is the ability to have variable size mbufs.  At the
 > very least, I think a single mbuf should be of a size so
 > that the MTU fits inside it.  Fixing this would be a large
 > amount of work, and the gain is uncertain.
 > 
 > You can get a minor idea of the available gain by looking
 > at the Tigon II firmware changes to use page based buffer
 > allocations, per Bill Paul & Co..

If you're thinking of what I'm thinking of (the zero copy stuff), I
wrote that code. ;)

I seem to remember you talking about seeing a 10% speedup from using 
4MB pages for cluster mbufs.   How did you do that?  I'd like to see
what affect it has with this workload.

Thanks!

Drew

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15538.14223.494295.766977>