From owner-freebsd-arch@FreeBSD.ORG Tue May 27 08:30:36 2003 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9BF5037B401 for ; Tue, 27 May 2003 08:30:36 -0700 (PDT) Received: from puffin.mail.pas.earthlink.net (puffin.mail.pas.earthlink.net [207.217.120.139]) by mx1.FreeBSD.org (Postfix) with ESMTP id F0FC043FBF for ; Tue, 27 May 2003 08:30:35 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfjqj.dialup.mindspring.com ([165.247.207.83] helo=mindspring.com) by puffin.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19KgPF-0005u0-00; Tue, 27 May 2003 08:30:30 -0700 Message-ID: <3ED3844F.713FB360@mindspring.com> Date: Tue, 27 May 2003 08:29:19 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Peter Jeremy References: <20030526201740.GA22178@cirb503493.alcatel.com.au> <20030527102806.GC44520@cirb503493.alcatel.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a430e0f4f477a8cd8998973b7e8d6551c2a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c cc: arch@freebsd.org Subject: Re: sendfile(2) SF_NOPUSH flag proposal X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 15:30:37 -0000 Peter Jeremy wrote: > On Tue, May 27, 2003 at 11:57:20AM +0400, Igor Sysoev wrote: > >I thought about it more and I agree with you. TF_NOPUSH should be turned on > >at the start of a transaction and turned off at the end of a transaction. > > > >So I think there should be two flags: > >SF_NOPUSH - it turns TF_NOPUSH on before the sending. It's cheap: > >SF_PUSH - it turns TF_NOPUSH off after the sending has been completed. > > I agree that the code appears trivial but in order to justify its > inclusion, you will need to demonstrate that there is some benefit to > FreeBSD to implement this code. Good justification would be: > > 1) The same API is implemented somewhere else (or there is agreement > between multiple groups to implement it). I don't believe this > functionality is implemented anywhere else and you've not provided > any evidence that any other groups are considering such functionality. Actually, the functionality can be implemented *without* going and implementing the API. It should really be contrlled already by the TCP_NODELAY option *not* having been set by the user, and, for last-block next-first-block coelescing, by TCP_NOPUSH *having* been set. Basically, the stack is minorly misbehaving on us in the sendfile case; effectively, it's unintentionally fragging up to one packet between the user supplied header (if any) and the file content, and the file content and the user-supplied trailer (if any). It's nothing to be terrifically concerned about, unless you are paying by the packet, you keep you connections open a very long time (e.g. HTTP/1.1), such that the amortized packet count is relatively high, and your files, headers, and trailers are tiny, enough that the frags constitute a significant portion of your packet traffic. In other words, you have to win the lottery. 8-). > 2) The new feature provides significant performance benefit. In this > case, I believe the overhead of calling setsockopt(2) is negligible > so the performance gain would be negligible. The overhead of toggling it would be costly. However, I really don't understand why he isn't just not setting TCP_NODELAY in the first place, since it's an affirmative option, and then leaaving the socket alone to act like it's supposed to act. > 3) The new feature provides novel functionality that cannot be > achieved using the existing API (eg kqueue(2)). The functionality > is already available via setsockopt(2) so this isn't applicable. Heck; I'd argue that it can be achieved with sendfile(2), if you leave the TCP options alone, and are willing to accept not setting TCP_NOPUSH for back-to-back potentially one packet worth of overhead, just by reorganizing the sendfile(2) implementation to comply with existing default conditionals. > At this stage, I would suggest that you need to do better than "the > change is cheap" to justify adding this feature. Can you quantify > the performance benefits, or provide some other justification? I'd also like to see a performance comparison; the issue is probably that, without a testbed that can drive traffic at full Gigabit speeds, he's probably not going to be able to show anything of statistical significance from this; at full Gigabit speed, he could probably show CPU copy overhead that's high enough to impact total top-end throughput, as he runs out of CPU to do the copies. IMO, that'd only be true if his data set was small enough to fit in cache after the first one or two sends. The mbuf allocator overhead shows the same level of overhead, though, and you could reclaim performance there, instead, if you were looking for low-hanging fruit. -- Terry