Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 May 2003 08:29:19 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Peter Jeremy <peterjeremy@optushome.com.au>
Cc:        arch@freebsd.org
Subject:   Re: sendfile(2) SF_NOPUSH flag proposal
Message-ID:  <3ED3844F.713FB360@mindspring.com>
References:  <20030526201740.GA22178@cirb503493.alcatel.com.au> <20030527102806.GC44520@cirb503493.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Jeremy wrote:
> On Tue, May 27, 2003 at 11:57:20AM +0400, Igor Sysoev wrote:
> >I thought about it more and I agree with you. TF_NOPUSH should be turned on
> >at the start of a transaction and turned off at the end of a transaction.
> >
> >So I think there should be two flags:
> >SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap:
> >SF_PUSH - it turns TF_NOPUSH off after the sending has been completed.
> 
> I agree that the code appears trivial but in order to justify its
> inclusion, you will need to demonstrate that there is some benefit to
> FreeBSD to implement this code.  Good justification would be:
> 
> 1) The same API is implemented somewhere else (or there is agreement
>    between multiple groups to implement it).  I don't believe this
>    functionality is implemented anywhere else and you've not provided
>    any evidence that any other groups are considering such functionality.

Actually, the functionality can be implemented *without* going
and implementing the API.  It should really be contrlled already
by the TCP_NODELAY option *not* having been set by the user, and,
for last-block next-first-block coelescing, by TCP_NOPUSH *having*
been set.

Basically, the stack is minorly misbehaving on us in the sendfile
case; effectively, it's unintentionally fragging up to one packet
between the user supplied header (if any) and the file content,
and the file content and the user-supplied trailer (if any).

It's nothing to be terrifically concerned about, unless you are
paying by the packet, you keep you connections open a very long
time (e.g. HTTP/1.1), such that the amortized packet count is
relatively high, and your files, headers, and trailers are tiny,
enough that the frags constitute a significant portion of your
packet traffic.

In other words, you have to win the lottery.  8-).


> 2) The new feature provides significant performance benefit.   In this
>    case, I believe the overhead of calling setsockopt(2) is negligible
>    so the performance gain would be negligible.

The overhead of toggling it would be costly.  However, I really
don't understand why he isn't just not setting TCP_NODELAY in
the first place, since it's an affirmative option, and then
leaaving the socket alone to act like it's supposed to act.


> 3) The new feature provides novel functionality that cannot be
>    achieved using the existing API (eg kqueue(2)).  The functionality
>    is already available via setsockopt(2) so this isn't applicable.

Heck; I'd argue that it can be achieved with sendfile(2), if
you leave the TCP options alone, and are willing to accept not
setting TCP_NOPUSH for back-to-back potentially one packet
worth of overhead, just by reorganizing the sendfile(2)
implementation to comply with existing default conditionals.


> At this stage, I would suggest that you need to do better than "the
> change is cheap" to justify adding this feature.  Can you quantify
> the performance benefits, or provide some other justification?

I'd also like to see a performance comparison; the issue is
probably that, without a testbed that can drive traffic at
full Gigabit speeds, he's probably not going to be able to
show anything of statistical significance from this; at full
Gigabit speed, he could probably show CPU copy overhead that's
high enough to impact total top-end throughput, as he runs
out of CPU to do the copies.  IMO, that'd only be true if his
data set was small enough to fit in cache after the first one
or two sends.  The mbuf allocator overhead shows the same
level of overhead, though, and you could reclaim performance
there, instead, if you were looking for low-hanging fruit.

-- Terry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3ED3844F.713FB360>