Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 May 2003 21:46:54 +0400 (MSD)
From:      Igor Sysoev <is@rambler-co.ru>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        arch@freebsd.org
Subject:   Re: sendfile(2) SF_NOPUSH flag proposal
Message-ID:  <Pine.BSF.4.21.0305272137250.49494-100000@is>
In-Reply-To: <3ED38A13.524529B2@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 27 May 2003, Terry Lambert wrote:

> Igor Sysoev wrote:

> > I mean that if you have 230 bytes header then sendfile() will send it
> > in separate packet nevertheless the size of header and of the file.
> > Something like this - 230, 1460, 1460, ...
> 
> Again, see other post: this is arguably a sendfile(2) bug,
> though a reall minor one; one which should be addressed in
> the sendfile(2) implementation, and doesn't need options
> added to the API in order to address it.

How do suppose to coelesce the file pages ? Wire two or more pages
to mbuf's at once ?

BTW I did not see how sendfile() work over jumbo ethernet.  I suspect
that without TCP_NOPUSH it sometimes sends 4096 or 8192 bytes packets
instead of 9000.

> > > > it will return me 230 bytes:
> > >
> > > The "HEAD" is atypical, compared to the "GET"; the full Google
> > > front page is larger than that, and consists of multiple files;
> > > assuming you support HTTP/1.1 and pipelining, it's going to be
> > > a back-to-back transfer involving multiple sendfile() calls.
> > 
> > I use HEAD to show you the size of the HTTP header.
> > The HEAD is atypical but such small HTTP header is typical.
> 
> Here is my problem: you are arguing both amortized cost and
> total cost, depending on which is more supportive of your
> main thesis.  These arguments are seperate and orthogonal to
> each other: they don't support each other.  You can argue
> tiny files, and a relatively high total cost, or you can argue
> large files and pipelining, and a relatively high amortized
> cost, but you can't argue both time and large files and
> many connections and one connection at the same time.

Terry, I do not understand you.
My argument is simple - I want to avoid the partial packets because it
decreases the number of packets.  That's all.  There's nothing about
amortized cost or total cost.  I do not even know what they are.

> Personally, I'd step back and get the arguments straight,
> and get an implementation that demonstrates statistically
> significant performance differences, and then come back, if
> I wanted to press the case for additional option flags.  I
> have done this several times in the past, e.g. with my soft
> interrupt coelescing implementation that's now part of most
> of the ethernet drivers people care about.
> 
> Actually, in this case, I'd just try to fix sendfile(2) to
> do the packet coelescing I'd expect, given the relative
> state of the TCP_NODELAY and TCP_NOPUSH options flags.

Actually, sendfile() already works according to TCP_NOPUSH flag.
I do not know about TCP_NODELAY - I do not work with it.
But if you turn TCP_NOPUSH on then sendfile() will send the full packets.
If you turn TCP_NOPUSH off then sendfile() will send some packets partially
filled. It's correct.

> BTW: I'm still wary of the initial fault on the file data, if
> it's not already in cache: arguably, it's better to start
> sending the headers, and avoid the startup latency of delaying
> sending the headers until the fault is satisfied: part of the
> thing that's going to be eating your PCI bandwidth is the
> disk I/O, and your disks are going to be the slowest data
> sources/sinks in the whole equation.

I agree but after all it's 20ms or so delay.

> In any case, I expect that this should be handled in the
> context of TCP_NODELAY and TCP_NOPUSH, rather than by adding
> options to work around an arguably broken sendfile(2).

sendfile() already works nice with TCP_NOPUSH.  I propose only the flags
that allow to turn TCP_NOPUSH (actually TF_NOPUSH) on/off inside sendfile().
Then in one syscall you can turn TCP_NOPUSH on, send the HTTP header, the file
pages and turn TCP_NOPUSH off if all file pages are wired to mbuf's.
And this TCP_NOPUSH state is not bound by sendfile() internals, you
can control it via setsockopt/getsockopt(TCP_NOPUSH).


Igor Sysoev
http://sysoev.ru/en/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0305272137250.49494-100000>