Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Oct 2005 22:14:28 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Garrett Wollman <wollman@csail.mit.edu>, Poul-Henning Kamp <phk@phk.freebsd.dk>, net@freebsd.org
Subject:   Re: Call for performance evaluation: net.isr.direct (fwd) 
Message-ID:  <17232.26116.41070.832908@grasshopper.cs.duke.edu>
In-Reply-To: <20051015092141.F1403@epsplex.bde.org>
References:  <17231.43525.446450.161986@grasshopper.cs.duke.edu> <13600.1129298731@critter.freebsd.dk> <17231.50841.442047.622878@grasshopper.cs.duke.edu> <20051015092141.F1403@epsplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Bruce Evans writes:
 > On Fri, 14 Oct 2005, Andrew Gallatin wrote:
 > 
 > > Bear in mind that I have no clue about timekeeping.  I got into this
 > > just because I noticed using a TSC timecounter reduces context switch
 > > latency by 40% or more on all the SMP platforms I have access to:
 > >
 > > 1.0GHz dual PIII : 50% reduction vs i8254
 > > 3.06GHz 1 HTT P4 : 55% vs ACPI-safe, 70% vs i8254)
 > > 2.0GHz dual amd64: 43% vs ACPI-fast, 60% vs i8254)
 > >
 > > High context switch latency has been problem since FreeBSD 5 in
 > > networking due to the context switches for netisr use, and for the
 > > context switches required by interrupt threads.  I'm sure it is a
 > > problem in other parts of the system. I think it is pretty important,
 > > and I'd really like to see it fixed.
 > 
 > I'm not sure about that.  More the reverse.  Normal interrupts just
 > don't occur often enough for their context switch time to matter.  This
 > is most clear for disk devices.  Disk devices are relatively slow and
 > have even slower seeks, so have to talk to them in large (~64K) blocks
 > to get reasonable perfermonace and this results in not many transactions
 > (except with especially braindamaged hardware that does something like
 > interrupting for every 512-block).  Network devices have a normal
 > packet size of ~1500 bytes so they have to have interrupt moderation
 > to reduce the interrupt load, and non-braindamaged ones do.  However,
 > for netisrs I think it is common to process only 1 packet per context
 > switch, at least in the loopback case.

At least with our 4Gb Myrinet nics using a 9000 byte mtu, I have seen
TCP performance drop off if I interrupt less often than once every
30us.  So only so much interrupt moderation is possible.  This works
out to ~25,000 interrupts per second, when you factor in the delay for
the host to ack the interrupt.  Even with net.isr.direct, that's 25K
context switches per second with the interrupt thread.

Then there's latency.  If we disable interrupt coalescing, we have a
roughly 10us 1/2 rtt TCP latency for our 4Gb PCI-X cards on AMD64
(using linux, I don't have FreeBSD on any AMD64 with PCI-X).  I expect
this to be a few us lower with our new 10GbE nics.  An inflation of
the latency by 3.5us due to avoidable context switch latency really
hurts.  Heck, some people (mostly database users) care enough about
latency over sockets to risk using bizzare sockets-offload protocols
which offload a TCP connection to other protocols and cut this latency
in half (one example is
http://www.myri.com/myrinet/performance/Sockets-MX).

Drew






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?17232.26116.41070.832908>