From owner-freebsd-current@FreeBSD.ORG  Tue Nov 22 15:26:24 2011
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B47EB1065674;
	Tue, 22 Nov 2011 15:26:24 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 744E28FC14;
	Tue, 22 Nov 2011 15:26:24 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170])
	by cyrus.watson.org (Postfix) with ESMTPSA id EA25F46B23;
	Tue, 22 Nov 2011 10:26:23 -0500 (EST)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7A156B914;
	Tue, 22 Nov 2011 10:26:23 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: Luigi Rizzo <rizzo@iet.unipi.it>
Date: Tue, 22 Nov 2011 08:43:20 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; )
References: <CALJ8J_HPZewO12uanb=kctQYwepMssr63E0DQh9CqV6PGaC=JA@mail.gmail.com>
	<201111211129.29362.jhb@freebsd.org>
	<20111121173614.GA63552@onelab2.iet.unipi.it>
In-Reply-To: <20111121173614.GA63552@onelab2.iet.unipi.it>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201111220843.21207.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(bigwig.baldwin.cx); Tue, 22 Nov 2011 10:26:23 -0500 (EST)
Cc: Matteo Landi <matteo@matteolandi.net>, Doug Barton <dougb@freebsd.org>,
	freebsd-current@freebsd.org
Subject: Re: ixgbe and fast interrupts
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Nov 2011 15:26:24 -0000

On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote:
> On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote:
> > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote:
> > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> > > > On 11/18/2011 09:54, Luigi Rizzo wrote:
> > > > > One more thing (i am mentioning it here for archival purposes,
> > > > > as i keep forgetting to test it). Is entropy harvesting expensive ?
> > > > 
> > > > No. It was designed to be inexpensive on purpose. :)
> > > 
> > > hmmm....
> > > unfortunately I don't have a chance to test it until monday
> > > (probably one could see if the ping times change by modifying
> > > the value of kern.random.sys.harvest.* ).
> > > 
> > > But in the code i see the following:
> > > 
> > > - the harvest routine is this:
> > > 
> > >     void
> > >     random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
> > > 	enum esource origin)
> > >     {
> > >         if (reap_func)
> > >                 (*reap_func)(get_cyclecount(), entropy, count, bits, frac,
> > >                     origin);
> > >     }
> > > 
> > > - the reap_func seems to be bound to
> > > 
> > >     dev/random/randomdev_soft.c::random_harvest_internal()
> > > 
> > >   which internally uses a spinlock and then moves entries between
> > >   two lists.
> > > 
> > > I am concerned that the get_cyclecount() might end up querying an
> > > expensive device (is it using kern.timecounter.hardware ?)
> > 
> > On modern x86 it just does rdtsc().
> > 
> > > So between the indirect function call, spinlock, list manipulation
> > > and the cyclecounter i wouldn't be surprised it the whole thing
> > > takes a microsecond or so.
> > 
> > I suspect it is not quite that expensive.
> > 
> > > Anyways, on monday i'll know better. in the meantime, if someone
> > > wants to give it a try... in our tests between two machines and
> > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
> > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
> > > a ping -f .
> > 
> > Did you time it with harvest.interrupt disabled?
> 
> yes, thanks for reminding me to post the results.
> 
> Using unmodified ping (which has 1us resolution on the reports),
> there is no measurable difference irrespective
> of the setting of kern.random.sys.harvest.ethernet,
> kern.random.sys.harvest.interrupt and kern.timecounter.hardware.
> Have tried to set hw mitigation to 0 on the NIC (ixgbe on both
> sides) but there is no visible effect either.

I had forgotten that kern.random.sys.harvest.interrupt only matters if the
interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr().  I
suspect your drivers probably aren't doing that anyway.

> However I don't trust my measurements because i cannot explain them.
> Response times have a min of 20us (about 50 out of 5000 samples)
> and a median of 27us, and i really don't understand if the low
> readings are real or the result of some races.

Hmm, 7 us does seem a bit much for a spread.

> Ping does a gettimeofday() for the initial timestamp, and relies
> on in-kernel timestamp for the response.

Hmm, gettimeofday() isn't super cheap.  What I do for measuring RTT is to
use an optimized echo server (not the one in inetd) on the remote host and
reflect packets off of that.  The sender/receiver puts a TSC timestamp into
the packet payload and computes a TSC delta when it receives the reflected
response.  I then run ministat over the TSC deltas to get RTT in TSC counts
and use machdep.tsc_freq of the sending machine to convert the TSC delta
values to microseconds.

-- 
John Baldwin