From owner-freebsd-current@FreeBSD.ORG Tue Nov 22 15:26:24 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B47EB1065674; Tue, 22 Nov 2011 15:26:24 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 744E28FC14; Tue, 22 Nov 2011 15:26:24 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id EA25F46B23; Tue, 22 Nov 2011 10:26:23 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 7A156B914; Tue, 22 Nov 2011 10:26:23 -0500 (EST) From: John Baldwin To: Luigi Rizzo Date: Tue, 22 Nov 2011 08:43:20 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p8; KDE/4.5.5; amd64; ; ) References: <201111211129.29362.jhb@freebsd.org> <20111121173614.GA63552@onelab2.iet.unipi.it> In-Reply-To: <20111121173614.GA63552@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201111220843.21207.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 22 Nov 2011 10:26:23 -0500 (EST) Cc: Matteo Landi , Doug Barton , freebsd-current@freebsd.org Subject: Re: ixgbe and fast interrupts X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Nov 2011 15:26:24 -0000 On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote: > On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote: > > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote: > > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote: > > > > On 11/18/2011 09:54, Luigi Rizzo wrote: > > > > > One more thing (i am mentioning it here for archival purposes, > > > > > as i keep forgetting to test it). Is entropy harvesting expensive ? > > > > > > > > No. It was designed to be inexpensive on purpose. :) > > > > > > hmmm.... > > > unfortunately I don't have a chance to test it until monday > > > (probably one could see if the ping times change by modifying > > > the value of kern.random.sys.harvest.* ). > > > > > > But in the code i see the following: > > > > > > - the harvest routine is this: > > > > > > void > > > random_harvest(void *entropy, u_int count, u_int bits, u_int frac, > > > enum esource origin) > > > { > > > if (reap_func) > > > (*reap_func)(get_cyclecount(), entropy, count, bits, frac, > > > origin); > > > } > > > > > > - the reap_func seems to be bound to > > > > > > dev/random/randomdev_soft.c::random_harvest_internal() > > > > > > which internally uses a spinlock and then moves entries between > > > two lists. > > > > > > I am concerned that the get_cyclecount() might end up querying an > > > expensive device (is it using kern.timecounter.hardware ?) > > > > On modern x86 it just does rdtsc(). > > > > > So between the indirect function call, spinlock, list manipulation > > > and the cyclecounter i wouldn't be surprised it the whole thing > > > takes a microsecond or so. > > > > I suspect it is not quite that expensive. > > > > > Anyways, on monday i'll know better. in the meantime, if someone > > > wants to give it a try... in our tests between two machines and > > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping > > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with > > > a ping -f . > > > > Did you time it with harvest.interrupt disabled? > > yes, thanks for reminding me to post the results. > > Using unmodified ping (which has 1us resolution on the reports), > there is no measurable difference irrespective > of the setting of kern.random.sys.harvest.ethernet, > kern.random.sys.harvest.interrupt and kern.timecounter.hardware. > Have tried to set hw mitigation to 0 on the NIC (ixgbe on both > sides) but there is no visible effect either. I had forgotten that kern.random.sys.harvest.interrupt only matters if the interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr(). I suspect your drivers probably aren't doing that anyway. > However I don't trust my measurements because i cannot explain them. > Response times have a min of 20us (about 50 out of 5000 samples) > and a median of 27us, and i really don't understand if the low > readings are real or the result of some races. Hmm, 7 us does seem a bit much for a spread. > Ping does a gettimeofday() for the initial timestamp, and relies > on in-kernel timestamp for the response. Hmm, gettimeofday() isn't super cheap. What I do for measuring RTT is to use an optimized echo server (not the one in inetd) on the remote host and reflect packets off of that. The sender/receiver puts a TSC timestamp into the packet payload and computes a TSC delta when it receives the reflected response. I then run ministat over the TSC deltas to get RTT in TSC counts and use machdep.tsc_freq of the sending machine to convert the TSC delta values to microseconds. -- John Baldwin