Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Oct 2013 16:20:08 -0400
From:      Randall Stewart <rrs@lakerest.net>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Andre Oppermann <andre@freebsd.org>, "freebsd-net@freebsd.org" <net@freebsd.org>
Subject:   Re: MQ Patch.
Message-ID:  <13BF1F55-EC13-482B-AF7D-59AE039F877D@lakerest.net>
In-Reply-To: <CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com>
References:  <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net> <526FFED9.1070704@freebsd.org> <CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Lugi:

 comments in line..


On Oct 29, 2013, at 3:58 PM, Luigi Rizzo wrote:

> my short, top-post comment is that I'd rather see some more
> coordination with Andre, and especially some high level README
> or other form of documentation explaining the architecture
> you have in mind before this goes in.
>=20
> To expand my point of view (and please do not read me as negative,
> i am trying to be constructive and avoid future troubles and
> volunteer to help with the design and implementation):
>=20
> (i'll omit issues re. style and unrelated patches in the diff
> because they are premature)
>=20
> 1. Having multiple separate software queues attached to a physical =
queue
> makes sense only if we have a clear and documented plan
> for scheduling traffic from these queues into the hw one.
> Otherwise it ends up being just another confusing hack
> that makes it difficult to reason about device drivers.
>=20
> We already have something similar now (with the drbr queue on top
> used in some cases when the hw ring overflows), the ALTQ hooks,
> and without documentation this does not seem to improve the
> current situation.
>=20


Well I can't get Adara to give up how it uses these in its product.. I =
was
lucky to get them to give back the low level work.

The problem with ALTQ is that it is really broken if you want to do any =
sort
of decent performance with queueing. However with a small bit of work =
(aka throw
away the altq queues themselves and set ALTQ to place the ac_qos number =
in here
and queue the packet) you could have ALTQ able to transmit at line-rate =
and
have proper QOS.

> 2. QoS is not just priority scheduling or AQM a-la RED/CODEL/PI,
> but a coherent framework where you can classify/partition traffic
> into separate queues, apply one of several queue management
> (taildrop/RED/CODEL/whatever) and scheduling (which queue to serve =
next)
> policies in an efficient way.
>=20
> Linux mostly gets this right (they even support hierarchical =
schedulers).

Which is also what ALTq attempts to do as well. Again I can't get Adara
to give there top level code.. but someone *could* hint hint hook altq =
up
to this and be able to have a reasonable performance model with altq...


>=20
> Dummynet has a reasonable architecture although not hierarchical
> and it operates at the IP level (or possibly at layer 2),
> which is probably too high (but not necessarily).
> We can also recycle the components, i.e. the classifier in ipfw
> and the scheduling algorithms. I am happy to help on this.
>=20
> ALTQ is too old and complex and inefficient and unmaintained to be =
considered.

Exactly..

>=20
> And i cannot comment on your code because you don't really explain
> what you want to do and how. Codel/PI are only queue management,
> not qos; and strict priority is just one (and probably the worse) =
policy
> one can have.

Of course but you need them if you want to prevent buffer-bloat.


>=20
> One comment i can make, however, on the fact that 256 queues are
> way too few for a proper system. You need the number to be
> dynamic and much larger (e.g. using flowid as a key).
>=20
> So, to conclude: i fully support any plan to design something that =
lets us
> implement scheduling (and qos, if you want to call it this way)
> in a reasonable way, but what is in your patch now does not really
> seem to improve the current situation in any way.
>=20


Its a step towards fixing that I am allowed to give. I can see
why Company's get frustrated with trying to give anything to the =
project.

R

> cheers
> luigi
>=20
>=20
>=20
> On Tue, Oct 29, 2013 at 11:30 AM, Andre Oppermann <andre@freebsd.org> =
wrote:
> On 29.10.2013 11:50, Randall Stewart wrote:
> Hi:
>=20
> As discussed at vBSDcon with andre/emaste and gnn, I am sending
> this patch out to all of you ;-)
>=20
> I wasn't at vBSDcon but it's good that you're sending it (again). ;)
>=20
>=20
> I have previously sent it to gnn, andre, jhb, rwatson, and several =
other
> of the usual suspects (as gnn put it) and received dead silence.
>=20
> Sorry 'bout that.  Too many things going on recently.
>=20
>=20
> What does this patch do?
>=20
> Well it add the ability to do multi-queue at the driver level. =
Basically
> any driver that uses the new interface gets under it N queues (default
> is 8) for each physical transmit ring it has. The driver picks up
> its queue 0 first, then queue 1 .. up to the max.
>=20
> To make I understand this correctly there are 8 soft-queues for each =
real
> transmit ring, correct?  And the driver will dequeue the lowest =
numbered
> queue for as long as there are packets in it.  Like a hierarchical =
strict
> queuing discipline.
>=20
> This is prone to head of line blocking and starvation by higher =
priority
> queues.  May become a big problem under adverse traffic patterns.
>=20
>=20
> This allows you to prioritize packets. Also in here is the start of =
some
> work I will be doing for AQM.. think either Pi or Codel ;-)
>=20
> Right now thats pretty simple and just (in a few drivers) as the =
ability
> to limit the amount of data on the ring=85 which can help reduce =
buffer
> bloat. That needs to be refined into a lot more.
>=20
> We actually have two queues, the soft-queue and the hardware ring =
which
> both can be rather large leading to various issues as you mention.
>=20
> I've started work on an FF contract to rethink the whole IFQ* model =
and
> to propose and benchmark different approaches.  After that to convert =
all
> drivers in the tree to the chosen model(s) and get rid of the legacy.  =
In
> general the choice of model will be done in the driver and no longer =
by
> the ifnet layer.  One or (most likely) more optimized models will be
> provided by the kernel for drivers to chose from.  The idea that most,
> if not all drivers use these standard kernel provided models to avoid
> code duplication.  However as the pace of new features is quite high
> we provide the full discretion for the driver to choose and experiment
> with their own ways of dealing with it.  This is under the assumption
> that once a now model has been found it is later moved to the kernel
> side and subsequently used by other drivers as well.
>=20
>=20
> This work is donated by Adara Networks and has been discussed in =
several
> of the past vendor summits.
>=20
> I plan on committing this before the IETF unless I hear major =
objections.
>=20
> There seems to be a couple of white space issues where first there is =
a tab
> and then actual whitespace for the second one and others all over the =
place.
>=20
> There seem to be a number of unrelated changes in sys/dev/cesa/cesa.c,
> sys/dev/fdt/fdt_common.c, sys/dev/fdt/simplebus.c, =
sys/kern/subr_bus.c,
> usr.sbin/ofwdump/ofwdump.c.
>=20
> It would be good to separate out the soft multi-queue changes from the =
ring
> depth changes and do each in at least one commit.
>=20
> There are two separate changes to sys/dev/oce/, one is renaming of the =
lock
> macros and the other the change to drbr.
>=20
> The changes to sys/kern/subr_bufring.c are not style compliant and we =
normally
> don't use Linux "wb()" barriers in FreeBSD native code.  The atomics_* =
should
> be used instead.
>=20
> Why would we need a multi-consumer dequeue?
>=20
> The new bufring functions on a first glance do seem to be safe on =
architectures
> with a more relaxed memory ordering / cache coherency model than x86.
>=20
> The atomic dance in a number of drbr_* functions doesn't seem to make =
much sense
> and a single spin-lock may result in atomic operations and bus lock =
cycles.
>=20
> There is a huge amount of includes pollution in sys/net/drbr.h which =
we are
> currently trying to get rid of and to avoid for the future.
>=20
>=20
> I like the general conceptual approach but the implementation feels =
bumpy and
> not (yet) ready for prime time.  In any case I'd like to take forward =
conceptual
> parts for the FF sponsored IFQ* rework.
>=20
> --=20
> Andre
>=20
>=20
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>=20
>=20
>=20
> --=20
> =
-----------------------------------------+-------------------------------
>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. =
dell'Informazione
>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>  TEL      +39-050-2211611               . via Diotisalvi 2
>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> =
-----------------------------------------+-------------------------------

------------------------------
Randall Stewart
803-317-4952 (cell)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13BF1F55-EC13-482B-AF7D-59AE039F877D>