Date: Fri, 14 Oct 2005 22:14:28 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Bruce Evans <bde@zeta.org.au> Cc: Garrett Wollman <wollman@csail.mit.edu>, Poul-Henning Kamp <phk@phk.freebsd.dk>, net@freebsd.org Subject: Re: Call for performance evaluation: net.isr.direct (fwd) Message-ID: <17232.26116.41070.832908@grasshopper.cs.duke.edu> In-Reply-To: <20051015092141.F1403@epsplex.bde.org> References: <17231.43525.446450.161986@grasshopper.cs.duke.edu> <13600.1129298731@critter.freebsd.dk> <17231.50841.442047.622878@grasshopper.cs.duke.edu> <20051015092141.F1403@epsplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans writes: > On Fri, 14 Oct 2005, Andrew Gallatin wrote: > > > Bear in mind that I have no clue about timekeeping. I got into this > > just because I noticed using a TSC timecounter reduces context switch > > latency by 40% or more on all the SMP platforms I have access to: > > > > 1.0GHz dual PIII : 50% reduction vs i8254 > > 3.06GHz 1 HTT P4 : 55% vs ACPI-safe, 70% vs i8254) > > 2.0GHz dual amd64: 43% vs ACPI-fast, 60% vs i8254) > > > > High context switch latency has been problem since FreeBSD 5 in > > networking due to the context switches for netisr use, and for the > > context switches required by interrupt threads. I'm sure it is a > > problem in other parts of the system. I think it is pretty important, > > and I'd really like to see it fixed. > > I'm not sure about that. More the reverse. Normal interrupts just > don't occur often enough for their context switch time to matter. This > is most clear for disk devices. Disk devices are relatively slow and > have even slower seeks, so have to talk to them in large (~64K) blocks > to get reasonable perfermonace and this results in not many transactions > (except with especially braindamaged hardware that does something like > interrupting for every 512-block). Network devices have a normal > packet size of ~1500 bytes so they have to have interrupt moderation > to reduce the interrupt load, and non-braindamaged ones do. However, > for netisrs I think it is common to process only 1 packet per context > switch, at least in the loopback case. At least with our 4Gb Myrinet nics using a 9000 byte mtu, I have seen TCP performance drop off if I interrupt less often than once every 30us. So only so much interrupt moderation is possible. This works out to ~25,000 interrupts per second, when you factor in the delay for the host to ack the interrupt. Even with net.isr.direct, that's 25K context switches per second with the interrupt thread. Then there's latency. If we disable interrupt coalescing, we have a roughly 10us 1/2 rtt TCP latency for our 4Gb PCI-X cards on AMD64 (using linux, I don't have FreeBSD on any AMD64 with PCI-X). I expect this to be a few us lower with our new 10GbE nics. An inflation of the latency by 3.5us due to avoidable context switch latency really hurts. Heck, some people (mostly database users) care enough about latency over sockets to risk using bizzare sockets-offload protocols which offload a TCP connection to other protocols and cut this latency in half (one example is http://www.myri.com/myrinet/performance/Sockets-MX). Drew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?17232.26116.41070.832908>