1:09 -0500 (EST)
Date: Sun, 24 Dec 2006 09:01:09 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Scott Long <scottl@samsco.org>
In-Reply-To: <458E11AE.2000004@samsco.org>
Message-ID: <20061224085231.Y37996@fledge.watson.org>
References: <XFMail.20061223102713.jdp@polstra.com>
	<20061223213014.U35809@fledge.watson.org>
	<458E11AE.2000004@samsco.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org,
	John Polstra <jdp@polstra.com>
Subject: Re: cvs commit: src/sys/dev/bge if_bge.c
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Dec 2006 09:01:10 -0000


On Sun, 24 Dec 2006, Scott Long wrote:

>> I try this experiement every few years, and generally don't measure much 
>> improvement.  I'll try it again with 10gbps early next year once back in 
>> the office again.  The more interesting transition is between the link 
>> layer and the network layer, which is high on my list of topics to look 
>> into in the next few weeks.  In particular, reworking the ifqueue handoff. 
>> The tricky bit is balancing latency, overhead, and concurrency...
>> 
>> FYI, there are several sets of patches floating around to modify if_em to 
>> hand off queues of packets to the link layer, etc.  They probably need 
>> updating, of course, since if_em has changed quite a bit in the last year. 
>> In my implementaiton, I add a new input routine that accepts mbuf packet 
>> queues.
>
> Have you tested this with more than just your simple netblast and netperf 
> tests?  Have you measured CPU usage during your tests?  With 10Gb coming, 
> pipelined processing of RX packets is becoming an interesting topic for all 
> OSes from a number of companies.  I understand your feeling about the 
> bottleneck being higher up than at just if_input. We'll see how this holds 
> up.

In my previous test runs, I was generally testing two general scenarios:

(1) Local sink - sinking small and large packet sizes to a single socket at a
     high rate.

(2) Local source - sourcing small and large packet sizes via a single socket
     at a high rate.

(3) IP forwarding - both unidirectional and bidirectional packet streams
     acrossan IP forwarding host with small and large packet sizes.

>From the perspective of optimizing these particular paths, small packet sizes 
best reveal processing overhead up to about the TCP/socket buffer layer on 
modern hardware (DMA, etc).  The uni/bidirectional axis is interesting because 
it helps reveal the impact of the direct dispatch vs. netisr dispatch choice 
for the IP layer with respect to exercising parallelism.  I didn't explicitly 
measure CPU, but as the configurations max out the CPUs in my test bed, 
typically any significant CPU reduction is measurable in an improvement in 
throughput.  For example, I was easily able to measure the CPU reduction in 
switching from using the socket reference to the file descriptor reference in 
sosend() on small packet transmit, which was a relatively minor functional 
change in locking and reference counting.

I have tentative plans to explicitly measuring cycle counts between context 
switches and during dispatches, but have not yet implemented that in the new 
setup.  I expect to have a chance to set up these new test runs and get back 
into experimenting with the dispatch model between the device driver, link 
layer, and network layer sometime in mid-January.  As the test runs are very 
time-consuming, I'd welcome suggestions on the testing before, rather than 
after, I run them. :-)

Robert N M Watson
Computer Laboratory
University of Cambridge