From owner-freebsd-current@FreeBSD.ORG Thu May 21 20:31:23 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5B921065670; Thu, 21 May 2009 20:31:23 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 620F58FC13; Thu, 21 May 2009 20:31:23 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApoEAONWFUqDaFvH/2dsb2JhbADTB4QLBQ X-IronPort-AV: E=Sophos;i="4.41,229,1241409600"; d="scan'208";a="36223976" Received: from danube.cs.uoguelph.ca ([131.104.91.199]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 May 2009 16:31:22 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id 763CB108459B; Thu, 21 May 2009 16:31:22 -0400 (EDT) X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca Received: from danube.cs.uoguelph.ca ([127.0.0.1]) by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LhdSDqYV9b+e; Thu, 21 May 2009 16:31:21 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id 044AB1084597; Thu, 21 May 2009 16:31:21 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id n4LKW3d20072; Thu, 21 May 2009 16:32:03 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Thu, 21 May 2009 16:32:03 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Andre Oppermann In-Reply-To: <4A1460A3.2010202@freebsd.org> Message-ID: References: <4A1460A3.2010202@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rwatson@freebsd.org, freebsd-current@freebsd.org Subject: Re: Socket related code duplication in NFS X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 May 2009 20:31:24 -0000 On Wed, 20 May 2009, Andre Oppermann wrote: > > e) The socket buffer is most efficient when it can aggregate a number of > packets together before they are processed. Can the NFS code set a low > water mark on the socket to get called only after a few packets have > arrived instead of each one? (In the select and taskqueue model.) > I think the answer to this one is "no". NFS traffic is RPC requests and replies, which are mostly rather small messages (the write request, read reply and readdir reply are the exceptions). NFS performance is very sensition to RPC RTT, which means anything that introduces delay in getting an RPC message through (such as waiting a little while for more data/messages) is normally a detrement from what I've seen. It might be possible to handle the exceptions as a special case, but it isn't going to be easy, since TCP doesn't handle record marks, so knowing when a large message is coming would require something like "peeking" in the data for the RPC record marks. (Sun RPC puts a 32bit number in network byte order in front of each RPC message, which is it's length in bytes. A quirk on top of this is the definition of the high order bit of this mark indicating whether or not it is the last segment of a message. ie. An RPC message can be several record marked segments.) > f) I've been thinking of an modular socket filter approach (much like the > accept filter) scanning for upper layer specific markers or boundaries > and then signalling data availability. > If by this you mean scanning for the RPC message boundaries in the TCP stream (similar to what I said above), this could be very useful. So long as a message gets passed along as soon as you have a complete one, this sounds like a good idea to me. Btw, although FreeBSD currently uses 32Kbyte reads/writes, Solaris10 is using up to 1Mbyte and I'd like to see that happenning in FreeBSD too. (When you have 1Mbyte write request and read reply messages, delaying an upcall until you have an entire message, might work well.) Good luck with it, it sounds like an interesting project, rick