From owner-freebsd-performance@FreeBSD.ORG  Sun Jan 30 12:55:37 2011
Return-Path: <owner-freebsd-performance@FreeBSD.ORG>
Delivered-To: freebsd-performance@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 021DF106566B;
	Sun, 30 Jan 2011 12:55:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 504808FC0A;
	Sun, 30 Jan 2011 12:55:35 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1PjWoW-000PwX-OX; Sun, 30 Jan 2011 15:55:32 +0300
Date: Sun, 30 Jan 2011 15:55:32 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <20110130125532.GO18170@zxy.spb.ru>
References: <22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com>
	<20110128161035.GF18170@zxy.spb.ru>
	<CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com>
	<4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru>
	<20110129070205.Q7034@besplex.bde.org>
	<20110128215215.GJ18170@zxy.spb.ru>
	<20110129133859.O967@besplex.bde.org>
	<20110129102420.GK18170@zxy.spb.ru>
	<20110129233542.O20731@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110129233542.O20731@besplex.bde.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: freebsd-performance@FreeBSD.org, Julian Elischer <julian@FreeBSD.org>,
	Stefan Lambrev <stefan.lambrev@moneybookers.com>
Subject: Re: Interrupt performance
X-BeenThere: freebsd-performance@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Performance/tuning <freebsd-performance.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-performance>
List-Post: <mailto:freebsd-performance@freebsd.org>
List-Help: <mailto:freebsd-performance-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-performance>,
	<mailto:freebsd-performance-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Jan 2011 12:55:37 -0000

On Sat, Jan 29, 2011 at 11:54:11PM +1100, Bruce Evans wrote:

> > And I see drammaticaly less number of context switches in linux stats
> > (by dstat).
> 
> FreeBSD uses ithreds for most interrupts, so of course it does many
> more context switches (at least 2 per interrupt).  This doesn't make
> much difference provided there are not too many.  I think the version
> of re that you are using actually uses "fast" interrupts and a task
> queue.  This also seems to be making little difference.  You get a
> relatively lightweight "fast" interrupt following by followed by a
> context switch to and from the task.  IIRC, your statistics showed 
> about twice as many context switches as interrupts, so the task queue
> isn't doing much to reduce the "interrupt overhead" -- it just gives
> context switches to the task instead of to an ithread.

Now I build kernel with polling and profiling.
Network performance with profiling (off) don't change.

 procs      memory      page                   disk   faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr ad0   in   sy   cs us sy id
 1 0 0  98824K   431M     0   0   0   0     0   0   0    0  117 2172  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  123 2176  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2175  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2197  0  1 99
 0 0 0  98824K   431M     0   0   0   0     0   0   0    0  115 2175  0  1 99


Network traffic ON:


 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107548 3206  4 96  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107778 3183  5 95  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107548 3184  1 99  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107155 3182  2 98  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107945 3206  2 98  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107613 3182  7 93  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107432 3180  5 95  0
 1 0 0    100M   430M     0   0   0   0     0   0   0    0 107523 3181  4 96  0

Report from gprof:

granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 41.4      31.12    31.12        0  100.00%           __mcount [1]
 36.2      58.30    27.18    54341     0.50     0.50  acpi_cpu_c1 [6]
  8.9      65.01     6.71  2521168     0.00     0.00  copyin [17]
  2.8      67.11     2.10   419006     0.01     0.01  in_cksum_skip [23]
  1.0      67.86     0.75 12236575     0.00     0.00  memcpy [29]
  0.8      68.43     0.58  9309659     0.00     0.00  uma_zalloc_arg [25]
  0.6      68.89     0.45  7293157     0.00     0.00  mb_ctor_mbuf [32]
  0.6      69.32     0.43  1008034     0.00     0.00  uma_find_refcnt [34]
  0.5      69.71     0.39  2933058     0.00     0.00  ether_output [24]
  0.5      70.07     0.36  2933058     0.00     0.00  if_transmit [38]
  0.3      70.31     0.25   504035     0.00     0.01  ip_output [18]
  0.3      70.56     0.24  2933257     0.00     0.00  bcmp [48]
  0.3      70.77     0.21   504032     0.00     0.01  m_uiotombuf [19]
  0.3      70.98     0.21  3352048     0.00     0.00  mb_dupcl [51]
  0.3      71.19     0.21  2514036     0.00     0.00  m_copym [28]
  0.3      71.39     0.20   419006     0.00     0.01  ip_fragment [21]
  0.2      71.56     0.17   504017     0.00     0.02  udp_send [16]
  0.2      71.74     0.17  2520731     0.00     0.00  bzero [53]
  0.2      71.91     0.17   504648     0.00     0.03  Xint0x80_syscall [8]
  0.2      72.07     0.16   504017     0.00     0.00  in_pcbconnect_setup [30]
  0.2      72.22     0.15   504017     0.00     0.03  sosend_dgram [15]
  0.2      72.37     0.15 25113400     0.00     0.00  critical_exit <cycle 1> [57]
  0.2      72.51     0.14 25113400     0.00     0.00  critical_enter [59]
  0.2      72.63     0.13   504104     0.00     0.00  mb_ctor_pack [60]
  0.2      72.75     0.11  1512179     0.00     0.00  _rw_runlock [62]
  0.1      72.85     0.10   504017     0.00     0.03  kern_sendit [13]
  0.1      72.95     0.10  9311895     0.00     0.00  uma_zfree_arg [49]
  0.1      73.05     0.10   504114     0.00     0.00  free [54]
  0.1      73.14     0.10  1512161     0.00     0.00  uiomove [20]

granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds

                                  called/total       parents
index  %time    self descendents  called+self    name           index   
                                  called/total       children

                                                     <spontaneous>
[1]     41.4   31.12        0.00                 __mcount [1]

-----------------------------------------------

                                                     <spontaneous>
[2]     36.2    0.01       27.18                 sched_idletd [2]
                0.00       27.18   54341/54341       cpu_idle [4]

-----------------------------------------------

                0.00       27.18   54341/54341       cpu_idle_acpi [5]
[3]     36.2    0.00       27.18   54341         acpi_cpu_idle [3]
               27.18        0.00   54341/54341       acpi_cpu_c1 [6]
                0.00        0.00  108682/108682      AcpiHwRead [157]
                0.00        0.00   54341/54341       acpi_TimerDelta [653]

-----------------------------------------------

                0.00       27.18   54341/54341       sched_idletd [2]
[4]     36.2    0.00       27.18   54341         cpu_idle [4]
                0.00       27.18   54341/54341       cpu_idle_acpi [5]   
                0.00        0.00   54341/54341       mp_grab_cpu_hlt [654]

-----------------------------------------------