From owner-freebsd-net@FreeBSD.ORG Sun Jan 20 01:19:21 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A2EABE3B; Sun, 20 Jan 2013 01:19:21 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-wg0-x22a.google.com (wg-in-x022a.1e100.net [IPv6:2a00:1450:400c:c00::22a]) by mx1.freebsd.org (Postfix) with ESMTP id 0B5548D6; Sun, 20 Jan 2013 01:19:20 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id 12so314680wgh.5 for ; Sat, 19 Jan 2013 17:19:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=cFL1Dk1pfQ0JphC5i1PazcBCwHHiC7WhvlRebxOBzRc=; b=h71duRcmXHAomb8LnPXBC/8TQj3BpAWR6H6tGAAIHrjzUBEyIPnkZzEa883rQXQprw zgOfpcHc0UQw+NhjezbFu4VyASm0suPWG3Y5t9Bo8CfUfUxB6wP/55ackyopcqh+RFFv jAvLbWTtcsz7lnUkuqs6E1UtPRw6e07dOnYzHM+lDmmRV6O5e2BfECXMPiv0OL12Q3mb wo8EX8R0L4Os2rBENvvhF8wQka175DtEPDysy6Hz4W9fLfNdTmWYdhtAp8+Y/yPGUBwn JibLbGG6xcXFFspPZz/h1JlFIEeF4+5ZifURXp3Q9d0uIUF1iLN2064FXdoZm+1CiN4X Sj+g== MIME-Version: 1.0 X-Received: by 10.194.179.34 with SMTP id dd2mr20487599wjc.1.1358644759573; Sat, 19 Jan 2013 17:19:19 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.217.57.9 with HTTP; Sat, 19 Jan 2013 17:19:19 -0800 (PST) In-Reply-To: <201301191114.29959.jhb@freebsd.org> References: <1358610450.75691.YahooMailClassic@web121604.mail.ne1.yahoo.com> <201301191114.29959.jhb@freebsd.org> Date: Sat, 19 Jan 2013 17:19:19 -0800 X-Google-Sender-Auth: FGQOffy1vRYn6jXY-k8kC661zOE Message-ID: Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx() From: Adrian Chadd To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: Barney Cordoba , Luigi Rizzo , freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jan 2013 01:19:21 -0000 On 19 January 2013 08:14, John Baldwin wrote: > However, I did describe an alternate setup where you can fix this. Part of > the key is to get various NICs to share a single logical queue of tasks. You > could simulate this now by having all the deferred tasks share a single > taskqueue with a pool of tasks, but that will still not fully cooperate with > ithreads. To do that you have to get the interrupt handlers themselves into > the shared taskqueue. Some changes I have in a p4 branch allow you to do that > by letting interrupt handlers reschedule themselves (avoiding the need for a > separate task and preventing the task from running concurrently with the > interrupt handler) and providing some (but not yet all) of the framework to > allow multiple devices to share a single work queue backed by a shared pool of > threads. How would that work when I want to pin devices to specific cores? We at ${WORK} developed/acquired/etc a company that makes network processors that are highly threaded, where you want to pin specific things to specific CPUs. If you just push network device handling to a pool of threads without allowing for pinning, you'll end up with some very, very poor behaviour. Windows, for example, complains loudly (read: BSODs saying your driver is buggy) if your tasklets take too much CPU to run without preempting. So at ${WORK}, we do yield RX processing after a (fair) while. Maybe we do want a way to allow the RX taskqueue to yield itself in a way that (a) let's us re-schedule it, and (b) tells the taskqueue to actually yield after this point and let other things have a go. Barney - yes I think processing 100 packets each time through the loop, on a gige interface, is a bit silly. My point was specifically about how to avoid livelock without introducing artificial delays in waiting for the next mitigated interrupt to occur (because you don't necessarily get another interrupt when you re-enable things, depending upon what the hardware is doing / how buggy your driver is.) Ideally you'd set some hard limit on how much CPU time the task takes before it yields, so you specifically avoid livelock under DoS conditions. Actually, one thing I did at a previous job, many years ago now, was to do weighted random / tail dropping of frames in the driver RX handling itself, rather than having it go up to the stack and take all the extra CPU to process things. Again, my suggestion is how to avoid livelock under highly stressful conditions, rather than just going down the path of polling (for example.) Adrian