From owner-freebsd-arch@FreeBSD.ORG Sun Jan 6 13:47:26 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECAA016A46E; Sun, 6 Jan 2008 13:47:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id B5B1F13C474; Sun, 6 Jan 2008 13:47:25 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 5C1804B2B0; Sun, 6 Jan 2008 08:47:25 -0500 (EST) Date: Sun, 6 Jan 2008 13:47:24 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: arch@FreeBSD.org Message-ID: <20080106124517.G105@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: kmacy@FreeBSD.org, net@FreeBSD.org Subject: Network device driver KPI/ABI and TOE X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Jan 2008 13:47:26 -0000 Dear all, Last month, Kip Macy committed support for TCP offload to the FreeBSD CVS repository for the Chelsio 10gbps device driver. We've had interest from other vendors in supporting TOE on FreeBSD, although it remains unclear as yet which will end up supporting it. This e-mail is about how we want to treat the TOE interface with respect to third party device driver support, and more specifically to propose that we not consider the TOE interface to be part of our stable network device driver KPI/ABI once it appears in a RELENG_X branch. The background: in the last few FreeBSD versions (late 5.x, 6.x, 7.x), we've attempted to offer network and storage device driver authors a stable KPI and ABI across minor FreeBSD releases. The goal of this has been to allow authors to produce a device driver module for a .0 release, and then have it continue to function for .1, .2, and so on. We've not attempted to formalize the details of this for network device drivers, but implicitly this includes interface stability for things like mbuf and memory management routines, the ifnet interface, locking interfaces and data structures, newbus, busdma, and so on. If we had to, we would break the ABI in order to fix critical bugs (etc), but we try hard to avoid it in order to improve interface stability, and, in general, we choose not to MFC features that would break existing device drivers. TOE comes with a series of defined interfaces in toedev.h (documentation forthcoming) and tcp_offload.h (documentation now in comments). However, TOE implementations must also interact directly with the TCP and other stack internals, including directly accessing socket buffers, routing, the inpcb and tcpcb data structures, TCP and inpcb locking protocols, and so on. This happens for two reasons: - First, TOE needs to interact with the contents of sockets and TCP in order to implement the offload (i.e., extracting data from socket buffers to transmit it, putting data into socket buffers on receive, accessing TCP connection properties such as socket options, address bindings, listen state, etc). - Second, TOE hardware implementations often don't implement all of TCP: they may implement the steady state but not TCP TIMEWAIT or connection setup, for example. To get a sense of the level of intimacy of one such driver, it's well worth perusing src/sys/dev/cxgb/ulp/tom in HEAD. This is not a criticism, but I do want people to be aware of what's there before getting involved in this discussion: TOE takes to a whole new level the mantra that layering is good for protocol design, but not good for implementation performance, and spans pretty much all layers of the network stack in its scope. There are serious ABI implications to this approach, as historically we've made significant changes to the TCP and socket buffer internals during -stable branches, such as optimizing performance, adding new TCP features, etc. There's a fairly aggressive list of forthcoming TCP features for 8.0 with MFC plans for several of them, such as congestion control selection and multiple routing tables. I've not attempted to analyze these past or proposed changes in detail to determine how disruptive they would be to a TOE implementation, but my guess is that they might well break TOE drivers, especially historic ones, had TOE been supported at the time. My proposal, and this is really a proposal to drive discussion as much as a proposal for a policy, is that the internal TCP data structures exported via the TOE interfaces and accessed by TOE device drivers *not* be considered ABI/KPI-stable in -STABLE branches. While I think we shouldn't intentionally change them to break TOE, it's unrealistic to expect that these network stack internals won't change as part of normal maintenance and feature development that take place in -STABLE branches. For those who aren't involved in those day-to-day internals, a comparable situation might be if a CAM SCSI storage driver was dependent not only on there being no changes made to the on-disk layout of UFS (even backwards compatible ones), but also the in-memory data structures of soft updates. Any significant changes to soft updates internals would break such device drivers due to a requirement for forward compatibility. In some ways this isn't a perfect comparison, as soft updates isn't under active development, but from a layering and abstraction perspective, it's quite similar. We don't yet ship TOE in a -STABLE branch, but I believe Kip hopes to MFC TOE support, and with other device driver vendors starting to take a look, I think we want out thoughts on the table regarding this matter. I presume that we'll see the TOE interfaces continue to evolve over the next 6-18 months, and we should make sure that we know whether or not third party device driver authors can expect ABI/KPI stability before, rather than after, it hits a -STABLE branch. On a similar note, these necessary changes to network stack internals will result in modifications to in-tree device drivers, so device driver authors who implement TOE should expect to see the TOE parts of their drivers being significantly modified as development occurs on those other parts of the stack. There's also the opportunity to think about whether it's possible to harden things in such a ways as to not give up our flexibility to keep maintaining and improving TCP (and other related subsystems), yet improving the quality of life for a third party TOE driver maintainer. For example, might we provide accessor routines for certain data structures, or attempt to structure things to hide more of TCP locking from a TOE implementation? Should we suggest that non-native TOE implementations rely less on our TCP code and provide there own where the hardware doesn't provide a complete implementation, in order to avoid building dependency on things that we know will change? Robert N M Watson Computer Laboratory University of Cambridge