Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Oct 2013 22:44:40 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, Randall Stewart <rrs@lakerest.net>, "freebsd-net@freebsd.org" <net@freebsd.org>
Subject:   Re: MQ Patch.
Message-ID:  <52702C48.3010706@freebsd.org>
In-Reply-To: <CAJ-VmokJaBhZE%2B3ZDsi0Yybuvtb_d7AH_RThCKs4inUM=UQrAQ@mail.gmail.com>
References:  <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net>	<526FFED9.1070704@freebsd.org>	<CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com>	<13BF1F55-EC13-482B-AF7D-59AE039F877D@lakerest.net>	<52701F7E.2060604@freebsd.org> <CAJ-VmokJaBhZE%2B3ZDsi0Yybuvtb_d7AH_RThCKs4inUM=UQrAQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29.10.2013 22:02, Adrian Chadd wrote:
> [snip everything]
>
> ok, I've reviewed the work.
>
> TL;DR - it's a clearly correct step in the right direction, but I
> think we need to just think it through a tad bit more first.
>
> There have been queue discipline and queue management discussions in
> the past. Randall's work is a good step in that direction.
>
> I think though that we can take a step back up a little further.
>
> * In terms of queuing frames into multiple queues - yes, we absolutely
> should have an if_transmit() path to the driver that obeys "a" QoS
> field in the mbuf and pushes it into the relevant queue - with
> randalls work, it's in the driver, but it doesn't _have_ to be;

Only the driver can know how much it can do in hardware and how
much has to be emulated in software.  The kernel should provide
a couple of optimized software emulation to driver should link into.

> * In terms of queue servicing and management - we likely need to have
> a variety of queue plugins that determine which frame from which queue
> gets chosen next to hand to the hardware. The hardware may have
> multiple queues! The hardware may have one queue! The application
> developer may only want to use one queue! That should be flexible and
> easy to plug into things.

We have to get rid of the current (and mostly mental) model of a
software queue.  The software queue only exists a) for historical
reasons as the first interface didn't have any DMA rings at all;
b) to manage concurrent access to a single or limited shared resource.
In reality the DMA ring is deep enough and *all the queue* we need.

> * Then we need to support dropping frames during queue and dropping
> frames during dequeue (ie, on its way to the hardware). That way we
> can implement the currently interesting kinds of queue disciplines (eg
> CODEL, etc.)

DMA rings by definition are tail drop.  If you want to do active QoS
and queue management you trade the DMA ring size for a software queue
size.  However this is only really an issue for routing types of traffic.
With TCP getting an ENOBUFS on a send attempt is perfectly valid and the
send socket buffer works as our queue.  No need to deep buffer yet once
more in software before the DMA ring.  The only thing is that TCP needs
some polish in that area to prevent it from thinking about a loss event.
Maybe Lawrence can audit and adjust the relevant parts of tcp_output()s
error handling.  It should simply try again a few milliseconds later
without waiting for a retransmit timeout or the ACK clocking again.

> * Should this be done at the driver layer (ie it's a library that each
> driver creates and owns), or as a layer above it, controlling the
> network device (ie, the linux queue discipline method.)

If the hardware actually supports it, then it should be done in the
driver.  Otherwise the qos and queue management would get shimmed in
and highjack the (*if_transmit) function pointer to do the stuff in
software and ticking out the packets through TX complete callbacks
(or alternatively a timer as in dummynet).

> So, my comments:
>
> * I don't like how it's hard-coding drbr's into the drivers. Yes, the
> underlying state should be a drbr for now. But I'd rather we have a
> queue discipline plugin API that drivers create an instance of.

Full ACK.  That's the plan.

> * It'll have methods to init/flush the rings, queue a frame into a
> ring, dequeue a frame from a ring, be notified of transmit completions
> so more work can be done, etc.

Pretty much.  Drivers will be required to implement certain functionality
to manage the DMA ring depth and to provide a TX completion callback into
the software qos/queue shim but not the upper stack.

> * For people who do latency-sensitive things, they can just bypass
> this entirely and go straight to the hardware queues without going
> through this kind of intermediary queue layer.

IMHO this should be the default anyways with some provision to manage
contention by multiple cores.  For example by having a single packet
slot for each core in case the DMA ring is already locked by another
core.

> Randall - I think we can take your work and turn it into a net library
> that implements your queue management routines. That way we can start
> enabling people to tinker with it and replace it if they need to.

Moving struct ifnet and the drivers into the new model and making ifnet
opaque has already been signed up for by Gleb and me.  When that is in
place in the next weeks any kind of queue model can be implemented at
the drivers discretion, including Randalls.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52702C48.3010706>