Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Nov 2011 08:43:20 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Matteo Landi <matteo@matteolandi.net>, Doug Barton <dougb@freebsd.org>, freebsd-current@freebsd.org
Subject:   Re: ixgbe and fast interrupts
Message-ID:  <201111220843.21207.jhb@freebsd.org>
In-Reply-To: <20111121173614.GA63552@onelab2.iet.unipi.it>
References:  <CALJ8J_HPZewO12uanb=kctQYwepMssr63E0DQh9CqV6PGaC=JA@mail.gmail.com> <201111211129.29362.jhb@freebsd.org> <20111121173614.GA63552@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote:
> On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote:
> > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote:
> > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> > > > On 11/18/2011 09:54, Luigi Rizzo wrote:
> > > > > One more thing (i am mentioning it here for archival purposes,
> > > > > as i keep forgetting to test it). Is entropy harvesting expensive ?
> > > > 
> > > > No. It was designed to be inexpensive on purpose. :)
> > > 
> > > hmmm....
> > > unfortunately I don't have a chance to test it until monday
> > > (probably one could see if the ping times change by modifying
> > > the value of kern.random.sys.harvest.* ).
> > > 
> > > But in the code i see the following:
> > > 
> > > - the harvest routine is this:
> > > 
> > >     void
> > >     random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
> > > 	enum esource origin)
> > >     {
> > >         if (reap_func)
> > >                 (*reap_func)(get_cyclecount(), entropy, count, bits, frac,
> > >                     origin);
> > >     }
> > > 
> > > - the reap_func seems to be bound to
> > > 
> > >     dev/random/randomdev_soft.c::random_harvest_internal()
> > > 
> > >   which internally uses a spinlock and then moves entries between
> > >   two lists.
> > > 
> > > I am concerned that the get_cyclecount() might end up querying an
> > > expensive device (is it using kern.timecounter.hardware ?)
> > 
> > On modern x86 it just does rdtsc().
> > 
> > > So between the indirect function call, spinlock, list manipulation
> > > and the cyclecounter i wouldn't be surprised it the whole thing
> > > takes a microsecond or so.
> > 
> > I suspect it is not quite that expensive.
> > 
> > > Anyways, on monday i'll know better. in the meantime, if someone
> > > wants to give it a try... in our tests between two machines and
> > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
> > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
> > > a ping -f .
> > 
> > Did you time it with harvest.interrupt disabled?
> 
> yes, thanks for reminding me to post the results.
> 
> Using unmodified ping (which has 1us resolution on the reports),
> there is no measurable difference irrespective
> of the setting of kern.random.sys.harvest.ethernet,
> kern.random.sys.harvest.interrupt and kern.timecounter.hardware.
> Have tried to set hw mitigation to 0 on the NIC (ixgbe on both
> sides) but there is no visible effect either.

I had forgotten that kern.random.sys.harvest.interrupt only matters if the
interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr().  I
suspect your drivers probably aren't doing that anyway.

> However I don't trust my measurements because i cannot explain them.
> Response times have a min of 20us (about 50 out of 5000 samples)
> and a median of 27us, and i really don't understand if the low
> readings are real or the result of some races.

Hmm, 7 us does seem a bit much for a spread.

> Ping does a gettimeofday() for the initial timestamp, and relies
> on in-kernel timestamp for the response.

Hmm, gettimeofday() isn't super cheap.  What I do for measuring RTT is to
use an optimized echo server (not the one in inetd) on the remote host and
reflect packets off of that.  The sender/receiver puts a TSC timestamp into
the packet payload and computes a TSC delta when it receives the reflected
response.  I then run ministat over the TSC deltas to get RTT in TSC counts
and use machdep.tsc_freq of the sending machine to convert the TSC delta
values to microseconds.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201111220843.21207.jhb>