From owner-freebsd-performance@FreeBSD.ORG Sun Jan 30 12:55:37 2011 Return-Path: Delivered-To: freebsd-performance@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 021DF106566B; Sun, 30 Jan 2011 12:55:37 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id 504808FC0A; Sun, 30 Jan 2011 12:55:35 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1PjWoW-000PwX-OX; Sun, 30 Jan 2011 15:55:32 +0300 Date: Sun, 30 Jan 2011 15:55:32 +0300 From: Slawa Olhovchenkov To: Bruce Evans Message-ID: <20110130125532.GO18170@zxy.spb.ru> References: <22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com> <20110128161035.GF18170@zxy.spb.ru> <4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru> <20110129070205.Q7034@besplex.bde.org> <20110128215215.GJ18170@zxy.spb.ru> <20110129133859.O967@besplex.bde.org> <20110129102420.GK18170@zxy.spb.ru> <20110129233542.O20731@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110129233542.O20731@besplex.bde.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: freebsd-performance@FreeBSD.org, Julian Elischer , Stefan Lambrev Subject: Re: Interrupt performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Jan 2011 12:55:37 -0000 On Sat, Jan 29, 2011 at 11:54:11PM +1100, Bruce Evans wrote: > > And I see drammaticaly less number of context switches in linux stats > > (by dstat). > > FreeBSD uses ithreds for most interrupts, so of course it does many > more context switches (at least 2 per interrupt). This doesn't make > much difference provided there are not too many. I think the version > of re that you are using actually uses "fast" interrupts and a task > queue. This also seems to be making little difference. You get a > relatively lightweight "fast" interrupt following by followed by a > context switch to and from the task. IIRC, your statistics showed > about twice as many context switches as interrupts, so the task queue > isn't doing much to reduce the "interrupt overhead" -- it just gives > context switches to the task instead of to an ithread. Now I build kernel with polling and profiling. Network performance with profiling (off) don't change. procs memory page disk faults cpu r b w avm fre flt re pi po fr sr ad0 in sy cs us sy id 1 0 0 98824K 431M 0 0 0 0 0 0 0 0 117 2172 0 1 99 0 0 0 98824K 431M 0 0 0 0 0 0 0 0 123 2176 0 1 99 0 0 0 98824K 431M 0 0 0 0 0 0 0 0 115 2175 0 1 99 0 0 0 98824K 431M 0 0 0 0 0 0 0 0 115 2197 0 1 99 0 0 0 98824K 431M 0 0 0 0 0 0 0 0 115 2175 0 1 99 Network traffic ON: 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107548 3206 4 96 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107778 3183 5 95 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107548 3184 1 99 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107155 3182 2 98 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107945 3206 2 98 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107613 3182 7 93 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107432 3180 5 95 0 1 0 0 100M 430M 0 0 0 0 0 0 0 0 107523 3181 4 96 0 Report from gprof: granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds % cumulative self self total time seconds seconds calls ms/call ms/call name 41.4 31.12 31.12 0 100.00% __mcount [1] 36.2 58.30 27.18 54341 0.50 0.50 acpi_cpu_c1 [6] 8.9 65.01 6.71 2521168 0.00 0.00 copyin [17] 2.8 67.11 2.10 419006 0.01 0.01 in_cksum_skip [23] 1.0 67.86 0.75 12236575 0.00 0.00 memcpy [29] 0.8 68.43 0.58 9309659 0.00 0.00 uma_zalloc_arg [25] 0.6 68.89 0.45 7293157 0.00 0.00 mb_ctor_mbuf [32] 0.6 69.32 0.43 1008034 0.00 0.00 uma_find_refcnt [34] 0.5 69.71 0.39 2933058 0.00 0.00 ether_output [24] 0.5 70.07 0.36 2933058 0.00 0.00 if_transmit [38] 0.3 70.31 0.25 504035 0.00 0.01 ip_output [18] 0.3 70.56 0.24 2933257 0.00 0.00 bcmp [48] 0.3 70.77 0.21 504032 0.00 0.01 m_uiotombuf [19] 0.3 70.98 0.21 3352048 0.00 0.00 mb_dupcl [51] 0.3 71.19 0.21 2514036 0.00 0.00 m_copym [28] 0.3 71.39 0.20 419006 0.00 0.01 ip_fragment [21] 0.2 71.56 0.17 504017 0.00 0.02 udp_send [16] 0.2 71.74 0.17 2520731 0.00 0.00 bzero [53] 0.2 71.91 0.17 504648 0.00 0.03 Xint0x80_syscall [8] 0.2 72.07 0.16 504017 0.00 0.00 in_pcbconnect_setup [30] 0.2 72.22 0.15 504017 0.00 0.03 sosend_dgram [15] 0.2 72.37 0.15 25113400 0.00 0.00 critical_exit [57] 0.2 72.51 0.14 25113400 0.00 0.00 critical_enter [59] 0.2 72.63 0.13 504104 0.00 0.00 mb_ctor_pack [60] 0.2 72.75 0.11 1512179 0.00 0.00 _rw_runlock [62] 0.1 72.85 0.10 504017 0.00 0.03 kern_sendit [13] 0.1 72.95 0.10 9311895 0.00 0.00 uma_zfree_arg [49] 0.1 73.05 0.10 504114 0.00 0.00 free [54] 0.1 73.14 0.10 1512161 0.00 0.00 uiomove [20] granularity: each sample hit covers 16 byte(s) for 0.00% of 75.16 seconds called/total parents index %time self descendents called+self name index called/total children [1] 41.4 31.12 0.00 __mcount [1] ----------------------------------------------- [2] 36.2 0.01 27.18 sched_idletd [2] 0.00 27.18 54341/54341 cpu_idle [4] ----------------------------------------------- 0.00 27.18 54341/54341 cpu_idle_acpi [5] [3] 36.2 0.00 27.18 54341 acpi_cpu_idle [3] 27.18 0.00 54341/54341 acpi_cpu_c1 [6] 0.00 0.00 108682/108682 AcpiHwRead [157] 0.00 0.00 54341/54341 acpi_TimerDelta [653] ----------------------------------------------- 0.00 27.18 54341/54341 sched_idletd [2] [4] 36.2 0.00 27.18 54341 cpu_idle [4] 0.00 27.18 54341/54341 cpu_idle_acpi [5] 0.00 0.00 54341/54341 mp_grab_cpu_hlt [654] -----------------------------------------------