From owner-freebsd-net@FreeBSD.ORG  Tue Oct 29 18:31:12 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 438C6D5B
 for <net@freebsd.org>; Tue, 29 Oct 2013 18:31:12 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id AB3412BFE
 for <net@freebsd.org>; Tue, 29 Oct 2013 18:31:11 +0000 (UTC)
Received: (qmail 57064 invoked from network); 29 Oct 2013 19:01:41 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <rrs@lakerest.net>; 29 Oct 2013 19:01:41 -0000
Message-ID: <526FFED9.1070704@freebsd.org>
Date: Tue, 29 Oct 2013 19:30:49 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.0.1
MIME-Version: 1.0
To: Randall Stewart <rrs@lakerest.net>, net@freebsd.org
Subject: Re: MQ Patch.
References: <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net>
In-Reply-To: <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Oct 2013 18:31:12 -0000

On 29.10.2013 11:50, Randall Stewart wrote:
> Hi:
>
> As discussed at vBSDcon with andre/emaste and gnn, I am sending
> this patch out to all of you ;-)

I wasn't at vBSDcon but it's good that you're sending it (again). ;)

> I have previously sent it to gnn, andre, jhb, rwatson, and several other
> of the usual suspects (as gnn put it) and received dead silence.

Sorry 'bout that.  Too many things going on recently.

> What does this patch do?
>
> Well it add the ability to do multi-queue at the driver level. Basically
> any driver that uses the new interface gets under it N queues (default
> is 8) for each physical transmit ring it has. The driver picks up
> its queue 0 first, then queue 1 .. up to the max.

To make I understand this correctly there are 8 soft-queues for each real
transmit ring, correct?  And the driver will dequeue the lowest numbered
queue for as long as there are packets in it.  Like a hierarchical strict
queuing discipline.

This is prone to head of line blocking and starvation by higher priority
queues.  May become a big problem under adverse traffic patterns.

> This allows you to prioritize packets. Also in here is the start of some
> work I will be doing for AQM.. think either Pi or Codel ;-)
>
> Right now thats pretty simple and just (in a few drivers) as the ability
> to limit the amount of data on the ring… which can help reduce buffer
> bloat. That needs to be refined into a lot more.

We actually have two queues, the soft-queue and the hardware ring which
both can be rather large leading to various issues as you mention.

I've started work on an FF contract to rethink the whole IFQ* model and
to propose and benchmark different approaches.  After that to convert all
drivers in the tree to the chosen model(s) and get rid of the legacy.  In
general the choice of model will be done in the driver and no longer by
the ifnet layer.  One or (most likely) more optimized models will be
provided by the kernel for drivers to chose from.  The idea that most,
if not all drivers use these standard kernel provided models to avoid
code duplication.  However as the pace of new features is quite high
we provide the full discretion for the driver to choose and experiment
with their own ways of dealing with it.  This is under the assumption
that once a now model has been found it is later moved to the kernel
side and subsequently used by other drivers as well.

> This work is donated by Adara Networks and has been discussed in several
> of the past vendor summits.
>
> I plan on committing this before the IETF unless I hear major objections.

There seems to be a couple of white space issues where first there is a tab
and then actual whitespace for the second one and others all over the place.

There seem to be a number of unrelated changes in sys/dev/cesa/cesa.c,
sys/dev/fdt/fdt_common.c, sys/dev/fdt/simplebus.c, sys/kern/subr_bus.c,
usr.sbin/ofwdump/ofwdump.c.

It would be good to separate out the soft multi-queue changes from the ring
depth changes and do each in at least one commit.

There are two separate changes to sys/dev/oce/, one is renaming of the lock
macros and the other the change to drbr.

The changes to sys/kern/subr_bufring.c are not style compliant and we normally
don't use Linux "wb()" barriers in FreeBSD native code.  The atomics_* should
be used instead.

Why would we need a multi-consumer dequeue?

The new bufring functions on a first glance do seem to be safe on architectures
with a more relaxed memory ordering / cache coherency model than x86.

The atomic dance in a number of drbr_* functions doesn't seem to make much sense
and a single spin-lock may result in atomic operations and bus lock cycles.

There is a huge amount of includes pollution in sys/net/drbr.h which we are
currently trying to get rid of and to avoid for the future.


I like the general conceptual approach but the implementation feels bumpy and
not (yet) ready for prime time.  In any case I'd like to take forward conceptual
parts for the FF sponsored IFQ* rework.

-- 
Andre