Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Dec 2012 09:35:16 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Barney Cordoba <barney_cordoba@yahoo.com>, freebsd-net@freebsd.org
Subject:   Re: Latency issues with buf_ring
Message-ID:  <alpine.BSF.2.00.1212060929430.78351@fledge.watson.org>
In-Reply-To: <201212041108.17645.jhb@freebsd.org>
References:  <1353259441.19423.YahooMailClassic@web121605.mail.ne1.yahoo.com> <201212041108.17645.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 4 Dec 2012, John Baldwin wrote:

>> Q2: Are there any case studies or benchmarks for buf_ring, or it is just 
>> blindly being used because someone claimed it was better and offered it for 
>> free? One of the points of locking is to avoid race conditions, so the
>
> fact that you have races in a supposed lock-less scheme seems more than just 
> ironic.
>
> The buf_ring author claims it has benefits in high pps workloads.  I am not 
> aware of any benchmarks, etc.

... joining this conversation a bit late -- still about two years behind on 
net@ :-) ...

There are several places where having a good buf_ring primitive should offer 
significant benefits over blocking locks around queues:

- ifnet transmit enqueue path, whether owned by the general stack (ifqueue) or
   the driver (as is often the case with if_transmit).

- netisr queues used in deferred input dispatch, including loopback.

- A future lockless hand-off of inbound TCP segments from the ithread/netisr
   to an already running user thread a la Van Jacobson's proposal to the Linux
   community (now implemented), which would significantly reduce contention on
   inpcb locks in many workloads.

I've measured significant lock contention in all those places in the past, and 
I believe buf_ring was intended to address at least the first case.  This 
isn't the same as having benchmarks showing that the current code is "better", 
but the right primitive used in the right way should almost certainly help all 
of those cases substantially.  I know that when Philip Paeps was working with 
the Solarflare driver, switching to lockless dispatch in the outbound path 
made a significant difference.  One thing we do need to make sure is handled 
well is bounds on queue length, since we don't want infinitely long queues 
when a backlog begins to form -- there's no reason this can't be done, 
although the specifics depend on what one wants to accomplish and how.

I would like to see us making use of lockless queue primitives in these kinds 
of scenarios, motivated by benchmarking, and ideally addressing architectures 
with weaker memory consistency properly.  We should definitely minimise the 
number of different implementations of those primitives as much as possible, 
since (as with locks themselves) they are very hard to get right, and 
debugging problems with them can be quite problematic.

Robert



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1212060929430.78351>