Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 08 Apr 2002 21:49:57 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: performance of mbufs vs contig buffers?
Message-ID:  <3CB272F5.FF9D2C8F@mindspring.com>
References:  <15538.5971.620626.548508@grasshopper.cs.duke.edu> <3CB21FCF.6B018811@mindspring.com> <15538.14223.494295.766977@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Gallatin wrote:
>  > My other guess would be that the clusters you are dealing
>  > with are non-contiguous.  This has both scatter/gather
>  > implications, and cache-line implications when using them.
> 
> Please elaborate...  What sort of scatter/gather implications?
> Microbenchmarks don't show much of a difference DMA'ing to
> non-contigous vs. contigous pages. (over 400MB/sec in all cases).
> Also, we get close to link speed DMA'ing to user space, and with page
> coloring, that virtually guarantees that the pages are not physically
> contigous.

L2 cache busting would be an immediate result of scatter-gather
DMA.  And once you hit the pool size, then you would lose
considerable speed to wait states.  In general, cache lines are
much larger than mbuf cluster sizes.


> Based on the UDP behaviour, I think that its cache implications.  The
> bottleneck seems to be when copyout() reads the recently DMA'ed data.
> The driver reads the first few dozen bytes (so as to touch up the csum
> by subracting off the extra bits the DMA engines added in).  We do
> hardware csum offloading, so the entire packet is not read until
> copyout() is called.

I don't understand the copyout requirement here...


> I seem to remember you talking about seeing a 10% speedup from using
> 4MB pages for cluster mbufs.   How did you do that?  I'd like to see
> what affect it has with this workload.

I allocated them at system startup time, in machdep.c, out of
contiguous physical memory, and then established 4M mappings
for the data.  Then I linked all the mbufs onto the mbuf free
list, so that allocations would use my mbufs.  The benefit was
in the reduction in the amount of TLB thrashing that otherwise
occurred.  The overall speedup was closer to 16%.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3CB272F5.FF9D2C8F>