Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Jan 2008 13:47:24 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        arch@FreeBSD.org
Cc:        kmacy@FreeBSD.org, net@FreeBSD.org
Subject:   Network device driver KPI/ABI and TOE
Message-ID:  <20080106124517.G105@fledge.watson.org>

next in thread | raw e-mail | index | archive | help

Dear all,

Last month, Kip Macy committed support for TCP offload to the FreeBSD CVS 
repository for the Chelsio 10gbps device driver.  We've had interest from 
other vendors in supporting TOE on FreeBSD, although it remains unclear as yet 
which will end up supporting it.  This e-mail is about how we want to treat 
the TOE interface with respect to third party device driver support, and more 
specifically to propose that we not consider the TOE interface to be part of 
our stable network device driver KPI/ABI once it appears in a RELENG_X branch.

The background: in the last few FreeBSD versions (late 5.x, 6.x, 7.x), we've 
attempted to offer network and storage device driver authors a stable KPI and 
ABI across minor FreeBSD releases.  The goal of this has been to allow authors 
to produce a device driver module for a .0 release, and then have it continue 
to function for .1, .2, and so on.  We've not attempted to formalize the 
details of this for network device drivers, but implicitly this includes 
interface stability for things like mbuf and memory management routines, the 
ifnet interface, locking interfaces and data structures, newbus, busdma, and 
so on.  If we had to, we would break the ABI in order to fix critical bugs 
(etc), but we try hard to avoid it in order to improve interface stability, 
and, in general, we choose not to MFC features that would break existing 
device drivers.

TOE comes with a series of defined interfaces in toedev.h (documentation 
forthcoming) and tcp_offload.h (documentation now in comments).  However, TOE 
implementations must also interact directly with the TCP and other stack 
internals, including directly accessing socket buffers, routing, the inpcb and 
tcpcb data structures, TCP and inpcb locking protocols, and so on. This 
happens for two reasons:

- First, TOE needs to interact with the contents of sockets and TCP in order
   to implement the offload (i.e., extracting data from socket buffers to
   transmit it, putting data into socket buffers on receive, accessing TCP
   connection properties such as socket options, address bindings, listen
   state, etc).

- Second, TOE hardware implementations often don't implement all of TCP: they
   may implement the steady state but not TCP TIMEWAIT or connection setup, for
   example.

To get a sense of the level of intimacy of one such driver, it's well worth 
perusing src/sys/dev/cxgb/ulp/tom in HEAD.  This is not a criticism, but I do 
want people to be aware of what's there before getting involved in this 
discussion: TOE takes to a whole new level the mantra that layering is good 
for protocol design, but not good for implementation performance, and spans 
pretty much all layers of the network stack in its scope.

There are serious ABI implications to this approach, as historically we've 
made significant changes to the TCP and socket buffer internals during -stable 
branches, such as optimizing performance, adding new TCP features, etc. 
There's a fairly aggressive list of forthcoming TCP features for 8.0 with MFC 
plans for several of them, such as congestion control selection and multiple 
routing tables. I've not attempted to analyze these past or proposed changes 
in detail to determine how disruptive they would be to a TOE implementation, 
but my guess is that they might well break TOE drivers, especially historic 
ones, had TOE been supported at the time.

My proposal, and this is really a proposal to drive discussion as much as a 
proposal for a policy, is that the internal TCP data structures exported via 
the TOE interfaces and accessed by TOE device drivers *not* be considered 
ABI/KPI-stable in -STABLE branches.  While I think we shouldn't intentionally 
change them to break TOE, it's unrealistic to expect that these network stack 
internals won't change as part of normal maintenance and feature development 
that take place in -STABLE branches.

For those who aren't involved in those day-to-day internals, a comparable 
situation might be if a CAM SCSI storage driver was dependent not only on 
there being no changes made to the on-disk layout of UFS (even backwards 
compatible ones), but also the in-memory data structures of soft updates. Any 
significant changes to soft updates internals would break such device drivers 
due to a requirement for forward compatibility.  In some ways this isn't a 
perfect comparison, as soft updates isn't under active development, but from a 
layering and abstraction perspective, it's quite similar.

We don't yet ship TOE in a -STABLE branch, but I believe Kip hopes to MFC TOE 
support, and with other device driver vendors starting to take a look, I think 
we want out thoughts on the table regarding this matter.  I presume that we'll 
see the TOE interfaces continue to evolve over the next 6-18 months, and we 
should make sure that we know whether or not third party device driver authors 
can expect ABI/KPI stability before, rather than after, it hits a -STABLE 
branch.  On a similar note, these necessary changes to network stack internals 
will result in modifications to in-tree device drivers, so device driver 
authors who implement TOE should expect to see the TOE parts of their drivers 
being significantly modified as development occurs on those other parts of the 
stack.

There's also the opportunity to think about whether it's possible to harden 
things in such a ways as to not give up our flexibility to keep maintaining 
and improving TCP (and other related subsystems), yet improving the quality of 
life for a third party TOE driver maintainer.  For example, might we provide 
accessor routines for certain data structures, or attempt to structure things 
to hide more of TCP locking from a TOE implementation?  Should we suggest that 
non-native TOE implementations rely less on our TCP code and provide there own 
where the hardware doesn't provide a complete implementation, in order to 
avoid building dependency on things that we know will change?

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080106124517.G105>