Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Oct 2013 22:47:50 -0700
From:      Jack Vogel <jfvogel@gmail.com>
To:        Navdeep Parhar <np@freebsd.org>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, Andre Oppermann <andre@freebsd.org>, Randall Stewart <rrs@lakerest.net>, "freebsd-net@freebsd.org" <net@freebsd.org>
Subject:   Re: MQ Patch.
Message-ID:  <CAFOYbcm44v4yP4v05DiHURePsHH=SYJexdUAt0MsQZtu6RTVMA@mail.gmail.com>
In-Reply-To: <5270309E.5090403@FreeBSD.org>
References:  <40948D79-E890-4360-A3F2-BEC34A389C7E@lakerest.net> <526FFED9.1070704@freebsd.org> <CA%2BhQ2%2BgTc87M0f5pvFeW_GCZDogrLkT_1S2bKHngNcDEBUeZYQ@mail.gmail.com> <52701D8B.8050907@freebsd.org> <527022AC.4030502@FreeBSD.org> <527027CE.5040806@freebsd.org> <5270309E.5090403@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
I find myself agreeing with Navdeep,  what Windows does might provide a hint
(my god did I say that :)), the driver provides hints to the kernel, but
its from
"above" that the ultimate decisions are made based on what the hardware
hints are. So, its not either or, its both and....

Jack




On Tue, Oct 29, 2013 at 3:03 PM, Navdeep Parhar <np@freebsd.org> wrote:

> On 10/29/13 14:25, Andre Oppermann wrote:
> > On 29.10.2013 22:03, Navdeep Parhar wrote:
> >> On 10/29/13 13:41, Andre Oppermann wrote:
> >>> Let me jump in here and explain roughly the ideas/path I'm exploring
> >>> in creating and eventually implementing a big picture for drivers,
> >>> queues, queue management, various QoS and so on:
> >>>
> >>> Situation: We're still mostly based on the old 4.4BSD IFQ model with
> >>> a couple of work-arounds (sndring, drbr) and the bit-rotten ALTQ we
> >>> have in tree aren't helpful at all.
> >>>
> >>> Steps:
> >>>
> >>> 1. take the soft-queuing method out of the ifnet layer and make it
> >>>     a property of the driver, so that the upper stack (or actually
> >>>     protocol L3/L2 mapping/encapsulation layer) calls (*if_transmit)
> >>>     without any queuing at that point.  It then is up to the driver
> >>>     to decide how it multiplexes multi-core access to its queue(s)
> >>>     and how they are configured.
> >>
> >> It would work out much better if the kernel was aware of the number of
> >> tx queues of a multiq driver and explicitly selected one in if_transmit.
> >>   The driver has no information on the CPU affinity etc. of the
> >> applications generating the traffic; the kernel does.  In general, the
> >> kernel has a much better "global view" of the system and some of the
> >> stuff currently in the drivers really should move up into the stack.
> >
> > I've been thinking a lot about this and come to the preliminary
> conclusion
> > that the upper stack should not tell the driver which queue to use.
>  There
> > are way to many possible and depending on the use-case, better or worse
> > performing approaches.  Also we have a big problem with cores vs. queues
> > mismatches either way (more cores than queues or more queues than cores,
> > though the latter is much less of problem).
> >
> > For now I see these primary multi-hardware-queue approaches to be
> > implemented
> > first:
> >
> > a) the drivers (*if_transmit) takes the flowid from the mbuf header and
> >    selects one of the N hardware DMA rings based on it.  Each of the DMA
> >    rings is protected by a lock.  Here the assumption is that by having
> >    enough DMA rings the contention on each of them will be relatively low
> >    and ideally a flow and ring sort of sticks to a core that sends lots
> >    of packets into that flow.  Of course it is a statistical certainty
> that
> >    some bouncing will be going on.
> >
> > b) the driver assigns the DMA rings to particular cores which by that,
> > through
> >    a critnest++ can drive them lockless.  The drivers (*if_transmit)
> > will look
> >    up the core it got called on and push the traffic out on that DMA
> ring.
> >    The problem is the actual upper stacks affinity which is not
> guaranteed.
> >    This has to consequences: there may be reordering of packets of the
> same
> >    flow because the protocols send function happens to be called from a
> >    different core the second time.  Or the drivers (*if_transmit) has to
> >    switch to the right core to complete the transmit for this flow if the
> >    upper stack migrated/bounced around.  It is rather difficult to assure
> >    full affinity from userspace down through the upper stack and then to
> >    the driver.
> >
> > c) non-multi-queue capable hardware uses a kernel provided set of
> functions
> >    to manage the contention for the single resource of a DMA ring.
> >
> > The point here is that the driver is the right place to make these
> > decisions
> > because the upper stack lacks (and shouldn't care about) the actual
> > available
> > hardware and its capabilities.  All necessary information is available
> > to the
> > driver as well through the appropriate mbuf header fields and the core
> > it is
> > called on.
> >
>
> I mildly disagree with most of this, specifically with the part that the
> driver is the right place to make these decisions.  But you did say this
> was a "preliminary conclusion" so there's hope yet ;-)
>
> Let's wait till you have an early implementation and we are all able to
> experiment with it.  To be continued...
>
> Regards,
> Navdeep
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFOYbcm44v4yP4v05DiHURePsHH=SYJexdUAt0MsQZtu6RTVMA>