From owner-freebsd-net@FreeBSD.ORG  Sun Jan 20 01:19:21 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A2EABE3B;
 Sun, 20 Jan 2013 01:19:21 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-wg0-x22a.google.com (wg-in-x022a.1e100.net
 [IPv6:2a00:1450:400c:c00::22a])
 by mx1.freebsd.org (Postfix) with ESMTP id 0B5548D6;
 Sun, 20 Jan 2013 01:19:20 +0000 (UTC)
Received: by mail-wg0-f42.google.com with SMTP id 12so314680wgh.5
 for <multiple recipients>; Sat, 19 Jan 2013 17:19:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=cFL1Dk1pfQ0JphC5i1PazcBCwHHiC7WhvlRebxOBzRc=;
 b=h71duRcmXHAomb8LnPXBC/8TQj3BpAWR6H6tGAAIHrjzUBEyIPnkZzEa883rQXQprw
 zgOfpcHc0UQw+NhjezbFu4VyASm0suPWG3Y5t9Bo8CfUfUxB6wP/55ackyopcqh+RFFv
 jAvLbWTtcsz7lnUkuqs6E1UtPRw6e07dOnYzHM+lDmmRV6O5e2BfECXMPiv0OL12Q3mb
 wo8EX8R0L4Os2rBENvvhF8wQka175DtEPDysy6Hz4W9fLfNdTmWYdhtAp8+Y/yPGUBwn
 JibLbGG6xcXFFspPZz/h1JlFIEeF4+5ZifURXp3Q9d0uIUF1iLN2064FXdoZm+1CiN4X
 Sj+g==
MIME-Version: 1.0
X-Received: by 10.194.179.34 with SMTP id dd2mr20487599wjc.1.1358644759573;
 Sat, 19 Jan 2013 17:19:19 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.217.57.9 with HTTP; Sat, 19 Jan 2013 17:19:19 -0800 (PST)
In-Reply-To: <201301191114.29959.jhb@freebsd.org>
References: <1358610450.75691.YahooMailClassic@web121604.mail.ne1.yahoo.com>
 <201301191114.29959.jhb@freebsd.org>
Date: Sat, 19 Jan 2013 17:19:19 -0800
X-Google-Sender-Auth: FGQOffy1vRYn6jXY-k8kC661zOE
Message-ID: <CAJ-Vmomd1ivZjWdiC8_O1qwim2dctq1o+y5=UH2eivU4NdCOAQ@mail.gmail.com>
Subject: Re: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
From: Adrian Chadd <adrian@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Barney Cordoba <barney_cordoba@yahoo.com>, Luigi Rizzo <rizzo@iet.unipi.it>,
 freebsd-net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Jan 2013 01:19:21 -0000

On 19 January 2013 08:14, John Baldwin <jhb@freebsd.org> wrote:

> However, I did describe an alternate setup where you can fix this.  Part of
> the key is to get various NICs to share a single logical queue of tasks.  You
> could simulate this now by having all the deferred tasks share a single
> taskqueue with a pool of tasks, but that will still not fully cooperate with
> ithreads.  To do that you have to get the interrupt handlers themselves into
> the shared taskqueue.  Some changes I have in a p4 branch allow you to do that
> by letting interrupt handlers reschedule themselves (avoiding the need for a
> separate task and preventing the task from running concurrently with the
> interrupt handler) and providing some (but not yet all) of the framework to
> allow multiple devices to share a single work queue backed by a shared pool of
> threads.

How would that work when I want to pin devices to specific cores?

We at ${WORK} developed/acquired/etc a company that makes network
processors that are highly threaded, where you want to pin specific
things to specific CPUs.
If you just push network device handling to a pool of threads without
allowing for pinning, you'll end up with some very, very poor
behaviour.

Windows, for example, complains loudly (read: BSODs saying your driver
is buggy) if your tasklets take too much CPU to run without
preempting. So at ${WORK}, we do yield RX processing after a (fair)
while.

Maybe we do want a way to allow the RX taskqueue to yield itself in a
way that (a) let's us re-schedule it, and (b) tells the taskqueue to
actually yield after this point and let other things have a go.

Barney - yes I think processing 100 packets each time through the
loop, on a gige interface, is a bit silly. My point was specifically
about how to avoid livelock without introducing artificial delays in
waiting for the next mitigated interrupt to occur (because you don't
necessarily get another interrupt when you re-enable things, depending
upon what the hardware is doing / how buggy your driver is.) Ideally
you'd set some hard limit on how much CPU time the task takes before
it yields, so you specifically avoid livelock under DoS conditions.

Actually, one thing I did at a previous job, many years ago now, was
to do weighted random / tail dropping of frames in the driver RX
handling itself, rather than having it go up to the stack and take all
the extra CPU to process things. Again, my suggestion is how to avoid
livelock under highly stressful conditions, rather than just going
down the path of polling (for example.)




Adrian