FreeBSD Mail Archives

Date:      Thu, 4 Jul 2002 23:13:21 -0600
From:      "Kenneth D. Merry" <ken@kdm.org>
To:        Bosko Milekic <bmilekic@unixdaemons.com>
Cc:        Andrew Gallatin <gallatin@cs.duke.edu>, current@FreeBSD.ORG, net@FreeBSD.ORG
Subject:   Re: virtually contig jumbo mbufs (was Re: new zero copy sockets snapshot)
Message-ID:  <20020704231321.A42134@panzer.kdm.org>
In-Reply-To: <20020705002056.A5365@unixdaemons.com>; from bmilekic@unixdaemons.com on Fri, Jul 05, 2002 at 12:20:56AM -0400
References:  <20020619090046.A2063@panzer.kdm.org> <20020619120641.A18434@unixdaemons.com> <15633.17238.109126.952673@grasshopper.cs.duke.edu> <20020619233721.A30669@unixdaemons.com> <15633.62357.79381.405511@grasshopper.cs.duke.edu> <20020620114511.A22413@unixdaemons.com> <15634.534.696063.241224@grasshopper.cs.duke.edu> <20020620134723.A22954@unixdaemons.com> <15652.46870.463359.853754@grasshopper.cs.duke.edu> <20020705002056.A5365@unixdaemons.com>

On Fri, Jul 05, 2002 at 00:20:56 -0400, Bosko Milekic wrote:
> On Thu, Jul 04, 2002 at 04:59:02PM -0400, Andrew Gallatin wrote:
> >  >   I believe that the Intel chips do "virtual page caching" and that the
> >  > logic that does the virtual -> physical address translation sits between
> >  > the L2 cache and RAM.  If that is indeed the case, then your idea of
> >  > testing with virtually contiguous pages is a good one.
> >  >   Unfortunately, I don't know if the PIII is doing speculative
> >  > cache-loads, but it could very well be the case.  If it is and if in
> >  > fact the chip does caching based on virtual addresses, then providing it
> >  > with virtually contiguous address space may yield better results.  If
> >  > you try this, please let me know.  I'm extremely interested in seeing
> >  > the results!
> > 
> > contigmalloc'ed private jumbo mbufs (same as bge, if_ti, etc):
> > 
> > % iperf -c ugly-my -l 32k -fm
> > ------------------------------------------------------------
> > Client connecting to ugly-my, TCP port 5001
> > TCP window size:  0.2 MByte (default)
> > ------------------------------------------------------------
> > [  3] local 192.168.1.3 port 1031 connected with 192.168.1.4 port 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0-10.0 sec  2137 MBytes  1792 Mbits/sec
> > 
> > 
> > 
> > malloc'ed, physically discontigous private jumbo mbufs:
> > 
> > % iperf -c ugly-my -l 32k -fm
> > ------------------------------------------------------------
> > Client connecting to ugly-my, TCP port 5001
> > TCP window size:  0.2 MByte (default)
> > ------------------------------------------------------------
> > [  3] local 192.168.1.3 port 1029 connected with 192.168.1.4 port 5001
> > [ ID] Interval       Transfer     Bandwidth
> > [  3]  0.0-10.0 sec  2131 MBytes  1788 Mbits/sec
> > 
> > 
> > So I'd be willing to believe that the 4Mb/sec loss was due to 
> > the extra overhead of setting up 2 additional DMAs. 
> > 
> > 
> > So it looks like this idea would work. 
> 
>  Yes, it certainly confirms the virtual-based caching assumptions.  I
>  would like to provide virtually contiguous large buffers and believe I
>  can do that via mb_alloc... however, they would be several wired-down
>  pages.  Would this be in line with the requirements that these buffers
>  would have, in your mind?  (wired-down means that your buffers will
>  come out exactly as they would out of malloc(), so if you were using
>  malloc() already, I'm assuming that wired-down is OK).
> 
>  I think I can allocate the jumbo buffers via mb_alloc from the same map
>  as I allocate clusters from - the clust_map - and keep them in
>  buckets/slabs in per-CPU caches, like I do for mbufs and regular
>  clusters right now.  Luigi is in the process of doing some optimisation
>  work around mb_alloc and I'll probably be doing the SMP-specific stuff
>  after he's done so once that's taken care of, we can take a stab at
>  this if you think it's worth it.

If you do implement this, it would also be nice to have some sort of
standardized page-walking function to extract the physical addresses.
(Otherwise every driver will end up implementing its own loop to do it.)

We also may want to examine what sort of guarantees, if any, we can make
about the physical page alignment of the allocated mbuf.  i.e. if we can
guarantee that the mbuf data segment will start on a physical page boundary
(if it is at least a page in size), that would allow device drivers to be
able to guarantee that they could fit a jumbo frame (9000 bytes) into 3
scatter/gather segments on an i386.

The number of scatter/gather segments used is important for some boards,
like ti(4), because they have a limited number of scatter/gather segments
available.  In the case of ti(4), it is 4 S/G segments, which is enough to
handle the maximum number of physical data chunks it would take to compose
a 9K virtual buffer.  (You could start in the middle of a page, have two
complete pages, and then end with a partial page.)

I suppose it would be good to see what NIC drivers in the tree can receive
into or send from multiple chunks of data, and what their requirements are.
(how many scatter/gather segments they can handle, what is the maximum MTU,
etc.)

Ken
-- 
Kenneth Merry
ken@kdm.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020704231321.A42134>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation