From owner-freebsd-net@FreeBSD.ORG Sat Jan 19 16:16:59 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 704C61A8; Sat, 19 Jan 2013 16:16:59 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 36311262; Sat, 19 Jan 2013 16:16:59 +0000 (UTC) Received: from ralph.baldwin.cx (c-68-39-198-164.hsd1.de.comcast.net [68.39.198.164]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6E994B918; Sat, 19 Jan 2013 11:16:58 -0500 (EST) From: John Baldwin To: Barney Cordoba Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() Date: Sat, 19 Jan 2013 11:14:29 -0500 User-Agent: KMail/1.13.7 (FreeBSD/9.1-PRERELEASE; KDE/4.8.4; amd64; ; ) References: <1358610450.75691.YahooMailClassic@web121604.mail.ne1.yahoo.com> In-Reply-To: <1358610450.75691.YahooMailClassic@web121604.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201301191114.29959.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Sat, 19 Jan 2013 11:16:58 -0500 (EST) Cc: freebsd-net@freebsd.org, Adrian Chadd , Luigi Rizzo X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Jan 2013 16:16:59 -0000 On Saturday, January 19, 2013 10:47:30 AM Barney Cordoba wrote: > --- On Fri, 1/18/13, John Baldwin wrote: > > From: John Baldwin > > Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() > > To: freebsd-net@freebsd.org > > Cc: "Barney Cordoba" , "Adrian Chadd" > > , "Luigi Rizzo" Date: Friday, > > January 18, 2013, 11:49 AM > > On Friday, January 18, 2013 9:30:40 > > > > am Barney Cordoba wrote: > > > --- On Thu, 1/17/13, Adrian Chadd > > > > wrote: > > > > From: Adrian Chadd > > > > Subject: Re: two problems in > > > > dev/e1000/if_lem.c::lem_handle_rxtx() > > > > > > To: "Barney Cordoba" > > > > Cc: "Luigi Rizzo" , > > > > freebsd-net@freebsd.org > > > > > > Date: Thursday, January 17, 2013, 11:48 AM > > > > There's also the subtle race > > > > condition in TX and RX handling that > > > > re-queuing the taskqueue gets around. > > > > > > > > Which is: > > > > > > > > * The hardware is constantly receiving frames , > > > > right until > > > > > > you blow > > > > the FIFO away by filling it up; > > > > * The RX thread receives a bunch of frames; > > > > * .. and processes them; > > > > * .. once it's done processing, the hardware may > > > > have read > > > > > > some more > > > > frames in the meantime; > > > > * .. and the hardware may have generated a > > > > mitigated > > > > > > interrupt which > > > > you're ignoring, since you're processing frames; > > > > * So if your architecture isn't 100% paranoid, you > > > > may end > > > > > > up having > > > > to wait for the next interrupt to handle what's > > > > currently in > > > > > > the > > > > queue. > > > > > > > > Now if things are done correct: > > > > > > > > * The hardware generates a mitigated interrupt > > > > * The mask register has that bit disabled, so you > > > > don't end > > > > > > up receiving it; > > > > * You finish your RX queue processing, and there's > > > > more > > > > > > stuff that's > > > > appeared in the FIFO (hence why the hardware has > > > > generated > > > > > > another > > > > mitigated interrupt); > > > > * You unmask the interrupt; > > > > * .. and the hardware immediately sends you the > > > > MSI or > > > > > > signals an interrupt; > > > > * .. thus you re-enter the RX processing thread > > > > almost(!) > > > > > > immediately. > > > > > > > > However as the poster(s) have said, the interrupt > > > > mask/unmask in the > > > > intel driver(s) may not be 100% correct, so you're > > > > going to > > > > > > end up > > > > with situations where interrupts are missed. > > > > > > > > The reason why this wasn't a big deal in the > > > > deep/distant > > > > > > past is > > > > because we didn't used to have kernel preemption, > > > > or > > > > > > multiple kernel > > > > threads running, or an overly aggressive scheduler > > > > trying > > > > > > to > > > > parallelise things as much as possible. A lot of > > > > net80211/ath bugs > > > > have popped out of the woodwork specifically > > > > because of the > > > > > > above > > > > changes to the kernel. They were bugs before, but > > > > people > > > > > > didn't hit > > > > them. > > > > > > I don't see the distinction between the rx thread > > > > getting re-scheduled > > > > > "immediately" vs introducing another thread. In fact > > > > you increase missed > > > > > interrupts by this method. The entire point of > > > > interrupt moderation is > > > > > to tune the intervals where a driver is processed. > > > > > > You might as well just not have a work limit and > > > > process until your done. > > > > > The idea that "gee, I've been taking up too much cpu, > > > > I'd better yield" > > > > > to just queue a task and continue soon after doesn't > > > > make much sense to > > > > > me. > > > > If there are multiple threads with the same priority then > > batching the work up > > into chunks allows the scheduler to round-robin among > > them. However, when a > > task requeues itself that doesn't actually work since the > > taskqueue thread > > will see the requeued task before it yields the CPU. > > Alternatively, if you > > force all the relevant interrupt handlers to use the same > > thread pool and > > instead of requeueing a separate task you requeue your > > handler in the ithread > > pool then you can get the desired round-robin > > behavior. (I have changes to > > the ithread stuff that get us part of the way there in that > > handlers can > > reschedule themselves and much of the plumbing is in place > > for shared thread > > pools among different interrupts.) > > I dont see any "round robin" effect here. You have: > > Repeat: > - Process 100 frames > if (more) > - Queue a Task > > there's only 1 task at a time. All its really doing is yielding and > rescheduling itself to resume the loop. As I said above, in the current e1000 drivers using private taskqueues where the taskqueue thread priority is the same as the ithread priority, the round- robin doesn't really work because the taskqueue thread doesn't yield when the task is rescheduled since it will see the new task and go run it instead of yielding. However, I did describe an alternate setup where you can fix this. Part of the key is to get various NICs to share a single logical queue of tasks. You could simulate this now by having all the deferred tasks share a single taskqueue with a pool of tasks, but that will still not fully cooperate with ithreads. To do that you have to get the interrupt handlers themselves into the shared taskqueue. Some changes I have in a p4 branch allow you to do that by letting interrupt handlers reschedule themselves (avoiding the need for a separate task and preventing the task from running concurrently with the interrupt handler) and providing some (but not yet all) of the framework to allow multiple devices to share a single work queue backed by a shared pool of threads. -- John Baldwin