Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Dec 2007 19:16:22 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Kip Macy <kip.macy@gmail.com>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: pending changes for TOE support
Message-ID:  <20071215190252.I85668@fledge.watson.org>
In-Reply-To: <b1fa29170712151040icb371efseaf61d9b79907b24@mail.gmail.com>
References:  <b1fa29170712121303x537fd11fj4b8827bb353ad8e4@mail.gmail.com>  <b1fa29170712150057m690bd36bm7a1910969e92293b@mail.gmail.com>  <20071215100351.Q70617@fledge.watson.org> <b1fa29170712151040icb371efseaf61d9b79907b24@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 15 Dec 2007, Kip Macy wrote:

> The current implementation bypasses the firewall. This and likely other 
> hardware has extensive filtering support so it isn't neccessarily intrinsic.

I'm not sure I agree when it comes to features like DUMMYNET, NAT, BPF, etc. 
TCP offload completely bypasses, by its very intent, most of the network 
stack.

> The usage model at this moment is that the customer makes a conscious 
> decision to load the TOE driver and understands the implications. I think 
> this is quite adequate for 10GigE cards currently. However, this will need 
> to be revisited when these chips start showing up on mainstream 
> motherboards.

I think I would prefer that our policy switch be the capenable flag, so that 
compiling things in or out (or loading, which is the logical equivilent) 
doesn't change functional behavior for existing interfaces.

>> While I'm familiar with TCP, I'm less familiar with the scope of what cards 
>> support for TOE.  Do we know of any cards that are less capable than the 
>> chelsio card in this respect, or are they all sort of on-par on that front? 
>> I.e., do we think the above eventuality is likely?
>
> I don't have any way of knowing. I think it is probably safe to say that any 
> vendors that don't meet that criteria now will in the future as transistor 
> density increases.

I think it behooves us to find out, given that we're designing a KPI for those 
cards also.  I agree with the transistor argument, and given that TOE is a 
fairly undeployed technology at this point, it may quickly resolve itself if 
it hasn't.

>> If we don't, then one of the things I'd like to see us do is fairly 
>> carefully assert, at least for a few months, that TCP never "slips" into 
>> any transmission-related paths that could lead to truly odd and 
>> hard-to-diagnose behavior when runnning with TOE.  I.e., tcp_output, etc.
>
> I'm happy to do that. However, I see problems introduced by offloading 
> connections as being driver bugs much the same as problems caused by the 
> driver's TCP segmentation offload or checksum offload. The problems will be 
> isolated to connections using a specific interface.

Interesting point -- it's amazing how broken checksum processing in, and TCP 
is many orders of magnitude more complex.

>>      the socket code, both for sending/receiving.  You talk a bit about
>>      "credit", but introducing it up-front would be useful.
>
> I didn't realize a definition was necessary. To the best of my knowledge 
> this is the common term used when discussing flow control. I've seen it used 
> for Fibre Channel and IB. The one ambiguity that arises is whether or not it 
> refers to bytes or segments.

I think a phrase wouldn't hurt; also, I notice you did only address flow 
control in one direction in the comments, which is why I mentioned both 
sending and receiving.  The clearer we make this, the happier we'll be.  I 
suspect we'll actually want to move a lot of this text from the include file 
to the man page for the TOE interface...

>> (3) Could you talk at a high level about the ways in which TOE drivers will
>>      interact with TCP?  You do it a bit in each of the sections, but if
>>      there's a principle, pulling it out would be useful.  Also, you should
>>      indicate whether the driver is allowed to drop the inpcb lock or not.
>
> I've done my best to minimize changes to TCP. It is safe to assume that the 
> invariants are the same as those for tcp_output. I think we should ask the 
> author of tcp_output to document the interface, expected state transitions, 
> and its invariants (joke).

:-P

Documenting locking semantics such as "You can rely on lock X being held, but 
do not drop it" takes an extra phrase and can save someone a lot of time.

>> I'm a bit confused by the description of the error condition here.  Could 
>> you clarify when a driver should return an error, and what the impact of an 
>> error returned will be on the connection state?  In fact, it probably makes 
>> sense to have an up-front comment on conventions for error-handling -- if 
>> TOE returns an error will that generally lead to a TCP tear-down?
>
> The offload routines are substituted for tcp_output and thus should interact 
> with the stack in the same way. By extension they should have the same 
> failure modes and invariants.

Most driver authors will not be intimately familiar with tcp_output()'s 
subleties, and documenting error-handling for a KPI is always a good idea.

> The interface is intended to drop in the place of tcp_output.
<"see what tcp_output does" repeated many times>

tcp_output() was previously an internal function of the TCP code, and now the 
semantics are being exposed to device drivers.  Let's not perpetuate poorly 
documented driver interfaces by adding another one.  I think it would be a 
reasonable expectation of a driver author to have consistent documentation of 
the life cycle of data structures and objects, locking expectations and 
requirements, and the semantics for error values from functions.  Certainly, 
they need to look at TCP a fair amount because they'll be pulling things out 
of inpcb, tcpcb, etc, but I'd rather we limit that requirement to simple 
things (addresses, socket options) that are relatively static and avoid it 
being for complex things (locking, error handling) that tend to be more 
subject to change.  Also, if you document what you think the behavior is or 
should be, we can then check to see if we agree.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071215190252.I85668>