From owner-freebsd-current  Wed Nov 13 08:22:57 1996
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id IAA26456
          for current-outgoing; Wed, 13 Nov 1996 08:22:57 -0800 (PST)
Received: from premise.CS.Berkeley.EDU (root@premise.CS.Berkeley.EDU [128.32.33.172])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id IAA26440;
          Wed, 13 Nov 1996 08:22:39 -0800 (PST)
Received: from premise.CS.Berkeley.EDU (bmah@localhost.Berkeley.EDU [127.0.0.1]) by premise.CS.Berkeley.EDU (8.8.2/8.8.2) with ESMTP id IAA03524; Wed, 13 Nov 1996 08:22:30 -0800 (PST)
Message-Id: <199611131622.IAA03524@premise.CS.Berkeley.EDU>
X-Mailer: exmh version 1.6.9 8/22/96
To: Chris Csanady <ccsanady@friley216.res.iastate.edu>
cc: dyson@freebsd.org, gibbs@freefall.freebsd.org (Justin T. Gibbs),
        roberto@keltia.freenix.fr, freebsd-current@freebsd.org
Subject: Re: pbufs (was: Re: ufs is too slow?) 
In-reply-to: Your message of "Wed, 13 Nov 1996 07:15:18 CST."
             <199611131315.HAA03860@friley216.res.iastate.edu> 
From: bmah@cs.berkeley.edu (Bruce A. Mah)
Reply-to: bmah@cs.berkeley.edu
X-Face: g~c`.{#4q0"(V*b#g[i~rXgm*w;:nMfz%_RZLma)UgGN&=j`5vXoU^@n5<Pi&akO)o^8;[r
 %l(8ZHlbF`dD>v4:OO)c["!w)nD/!!~e4Sj7LiT'6*wZ83454H""lb{CC%T37O!!'S$S&D}sem7I[A
 2V%N&+
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 13 Nov 1996 08:22:26 -0800
Sender: owner-current@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Chris Csanady writes:

> VJ's pbufs?  ive not heard of these before, what are they?  i remember hearin
> g
> something about him rewriting the tcp/ip stack to work well on gigabit 
> networks...  this wouldnt have anything to do with it, would it?  probably
> not i suppose, but if anyone could point me to papers on anything related, id
> apreciate it. :)

I don't think Van has written any papers, but he's given a few talks on it.  
Going largely from memory:

One of the losses of the current BSD TCP/IP implementation is that the unit of 
memory allocation (mbufs) doesn't really match the unit of network 
transmission.  You'd like to allocate memory in packet- (or frame-) sized 
chunks, e.g. ~1500 bytes for Ethernet, rather than small mbufs (~112 bytes) or 
page mbufs (4K) which require a lot of copying and munging around of data.  So 
the idea behind pbufs is that the lower protocol layers expose enough details 
to higher layers (e.g. the socket layer and TCP) to make this feasible.

This also dovetails rather nicely with some assertions he's made about putting 
memory on network interfaces and mapping them into kernel memory (e.g. using 
the card's memory as socket/protocol buffers).  One example of this is the 
Medusa/Afterburner series of high-speed network interfaces from HP/HP Labs.

(Editorial note:  Packet traces have shown that many packets, at least on 
LANs, tend to be small.  So it's not clear to me what effect this would have 
for "typical" network traffic, though the wins for large bulk transfers have 
shown to be substantial.)

Some of the other modifications also streamlined the protocol processing by 
combining layers (in the implementation, not protocol design).  By doing 
several layers at once you can save overhead.  Layering is great when 
designing a protocol stack, but many times not so great when you go to build 
something.

Somewhere in the mess I call a desk I think I have hardcopies from a talk he 
gave on this stuff at a Gigabit TCP workshop a few years ago, but a cursory 
search has not revealed it.

Bruce.