From owner-freebsd-arch@FreeBSD.ORG Tue May 27 10:46:57 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 825C937B401 for ; Tue, 27 May 2003 10:46:57 -0700 (PDT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id 321E443FAF for ; Tue, 27 May 2003 10:46:56 -0700 (PDT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id h4RHksmF022233; Tue, 27 May 2003 21:46:54 +0400 (MSD) Date: Tue, 27 May 2003 21:46:54 +0400 (MSD) From: Igor Sysoev X-Sender: is@is To: Terry Lambert In-Reply-To: <3ED38A13.524529B2@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 17:46:57 -0000 On Tue, 27 May 2003, Terry Lambert wrote: > Igor Sysoev wrote: > > I mean that if you have 230 bytes header then sendfile() will send it > > in separate packet nevertheless the size of header and of the file. > > Something like this - 230, 1460, 1460, ... > > Again, see other post: this is arguably a sendfile(2) bug, > though a reall minor one; one which should be addressed in > the sendfile(2) implementation, and doesn't need options > added to the API in order to address it. How do suppose to coelesce the file pages ? Wire two or more pages to mbuf's at once ? BTW I did not see how sendfile() work over jumbo ethernet. I suspect that without TCP_NOPUSH it sometimes sends 4096 or 8192 bytes packets instead of 9000. > > > > it will return me 230 bytes: > > > > > > The "HEAD" is atypical, compared to the "GET"; the full Google > > > front page is larger than that, and consists of multiple files; > > > assuming you support HTTP/1.1 and pipelining, it's going to be > > > a back-to-back transfer involving multiple sendfile() calls. > > > > I use HEAD to show you the size of the HTTP header. > > The HEAD is atypical but such small HTTP header is typical. > > Here is my problem: you are arguing both amortized cost and > total cost, depending on which is more supportive of your > main thesis. These arguments are seperate and orthogonal to > each other: they don't support each other. You can argue > tiny files, and a relatively high total cost, or you can argue > large files and pipelining, and a relatively high amortized > cost, but you can't argue both time and large files and > many connections and one connection at the same time. Terry, I do not understand you. My argument is simple - I want to avoid the partial packets because it decreases the number of packets. That's all. There's nothing about amortized cost or total cost. I do not even know what they are. > Personally, I'd step back and get the arguments straight, > and get an implementation that demonstrates statistically > significant performance differences, and then come back, if > I wanted to press the case for additional option flags. I > have done this several times in the past, e.g. with my soft > interrupt coelescing implementation that's now part of most > of the ethernet drivers people care about. > > Actually, in this case, I'd just try to fix sendfile(2) to > do the packet coelescing I'd expect, given the relative > state of the TCP_NODELAY and TCP_NOPUSH options flags. Actually, sendfile() already works according to TCP_NOPUSH flag. I do not know about TCP_NODELAY - I do not work with it. But if you turn TCP_NOPUSH on then sendfile() will send the full packets. If you turn TCP_NOPUSH off then sendfile() will send some packets partially filled. It's correct. > BTW: I'm still wary of the initial fault on the file data, if > it's not already in cache: arguably, it's better to start > sending the headers, and avoid the startup latency of delaying > sending the headers until the fault is satisfied: part of the > thing that's going to be eating your PCI bandwidth is the > disk I/O, and your disks are going to be the slowest data > sources/sinks in the whole equation. I agree but after all it's 20ms or so delay. > In any case, I expect that this should be handled in the > context of TCP_NODELAY and TCP_NOPUSH, rather than by adding > options to work around an arguably broken sendfile(2). sendfile() already works nice with TCP_NOPUSH. I propose only the flags that allow to turn TCP_NOPUSH (actually TF_NOPUSH) on/off inside sendfile(). Then in one syscall you can turn TCP_NOPUSH on, send the HTTP header, the file pages and turn TCP_NOPUSH off if all file pages are wired to mbuf's. And this TCP_NOPUSH state is not bound by sendfile() internals, you can control it via setsockopt/getsockopt(TCP_NOPUSH). Igor Sysoev http://sysoev.ru/en/