From owner-svn-src-projects@FreeBSD.ORG Wed Jan 14 14:33:06 2015 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F20B4961; Wed, 14 Jan 2015 14:33:05 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C9280A1D; Wed, 14 Jan 2015 14:33:05 +0000 (UTC) Received: from new-host.home (pool-173-70-85-31.nwrknj.fios.verizon.net [173.70.85.31]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D1A95B91F; Wed, 14 Jan 2015 09:33:04 -0500 (EST) Message-ID: <54B67E20.3090701@FreeBSD.org> Date: Wed, 14 Jan 2015 09:33:04 -0500 From: John Baldwin User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Gleb Smirnoff Subject: Re: svn commit: r277122 - projects/ifnet/sys/dev/msk References: <201501130902.t0D927NE077024@svn.freebsd.org> <5330876.Sb1U9Iz8Cz@ralph.baldwin.cx> <20150113231735.GZ15484@FreeBSD.org> In-Reply-To: <20150113231735.GZ15484@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 14 Jan 2015 09:33:04 -0500 (EST) Cc: svn-src-projects@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jan 2015 14:33:06 -0000 On 1/13/15 6:17 PM, Gleb Smirnoff wrote: > John, > > On Tue, Jan 13, 2015 at 01:22:09PM -0500, John Baldwin wrote: > J> > o Convert from xxx_start(ifp) to xxx_transmit(ifp, m). > J> > A simple conversion itself is quite straight: > J> > * In ifdriver declaration define .ifdrv_maxqlen, take the value > J> > from IFQ_SET_MAXLEN() macro. > J> > * In ifdriver ifops declaration define if_transmit function. > J> > * Rename xxx_start() to xxx_transmit() and change its prototype. > J> > * The new named xxx_transmit() should: > J> > - Try to if_snd_enqueue() the mbuf or return. > J> > - Try to mtx_lock() the driver softc or return. > J> > J> Please no. This is a major source of pain with the current drivers that use > J> buf_ring that is worked around in various (often broken) ways. The problem > J> is that the other thread holding the lock might drop it right after your try > J> lock fails and then the packet you just queued doesn't go out until you queue > J> another packet. That's all fine and good if you are blasting packets out an > J> interface. It is not fine and good if you have sparse traffic. That packet > J> is now potentially delayed indefinitely. (In a real world scenario with a > J> heartbeat protocol that sent out timing packets once a second or so to measure > J> RTT between hosts this manifested as odd timings because we would see the > J> timings jump by a second every so often when the packet was delayed until the > J> next heartbeat). > > Yes, I also see this problem when coding that. > > J> This kind of design decision is why I do not want to just blindly convert > J> everything to suboptimal if_transmit routines. The current ifq based if_start > J> model is better for 10/100 single-queue drivers than this. > > If I convert mtx_trylock() to mtx_lock(), then we will get 1:1 conversion > to the current model. But, I still want to do an improvement while converting > drivers. :) The trylock isn't an improvement. I also worry that this approach is going to result in a lot of copy and pasted code in various drivers as they all reimplement the current if_transmit() (effectively). > May be the xxx_transmit() should check the queue again after softc unlock? I posted some ideas about how to handle this in a thread several years ago on net@ with various alternatives. In that case I was focused on buf_ring and I settled on an approach where a draining thread marked the queue as "busy" while it was draining it and cleared that flag before checking the head of the queue. The enqueue process returned a different errno value (EINPROGRESS or some such) if it queued a packet into a "busy" queue and the transmit routines were changed to 1) always enqueue the packet, and 2) if EINPROGRESS wasn't returned, use a blocking mtx_lock and start transmitting. However, even this model has some downsides in that one thread might be stuck transmitting packets queued by other threads and never pop back out to userland to service its associated application (a kind of starvation of the user side of the thread). Note that the mtx_trylock approach has the same problem. It might be nice to have a sort of limit on the number of packets a thread is willing to enqueue, but then you have the problem of ensuring any packets still on the queue when it hits its limit aren't also delayed indefinitely. I don't recall the exact mechanics of how Navdeep's mp_ring addresses this (though I believe it does from when I looked at it). Regardless, I think this was my point I attempted to make on IRC earlier: you need to figure out what you are going to do here first before you go through and convert all the drivers. Otherwise you will be stuck making multiple passes. Converting a "real" driver up front is useful so you can prototype different solutions, but I think you need to resolve this now before continuing your pass as the current approach is not suitable to be merged into HEAD. -- John Baldwin