From owner-freebsd-smp@FreeBSD.ORG Mon Nov 17 11:11:01 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 685861065678 for ; Mon, 17 Nov 2008 11:11:01 +0000 (UTC) (envelope-from archimedes.gaviola@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.236]) by mx1.freebsd.org (Postfix) with ESMTP id 3A05D8FC1B for ; Mon, 17 Nov 2008 11:11:00 +0000 (UTC) (envelope-from archimedes.gaviola@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so2282955rvf.43 for ; Mon, 17 Nov 2008 03:11:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=fGVHgR0LABWszzv9fduqH285vli6uWOgvFgOhzs4J3Q=; b=MuS42+YF1eisZG4dwOyGA15agH5ASE+eZIWNhCd89OFWkJZflXhaE1/Q/yeIOtEs/d nYwy9V6nHQ8Y5uUfXWAJrR6C009wMNQ6Yg/t9AtB/yDPoEyliej8hm4xbTtJrxHP3RE+ Latd+u7MhXF7DLqqv17FOGhIgEU5HFvsJ9AfM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=Udn6JI+XZq/VnqSTH8pv6xYfZ6UbRTJD4aHGcj7JY2g51CupdupAjPU/JGoMniTbwB wuKl01sT7CRNci6PxutKnQM2zQ6+jGoNl/DNAd0e5CzISFmg20DJX9r057baDgM0MWzt YqlKr6yoXAlj81PiRSXqzoIXk7i3ypLtydP0M= Received: by 10.114.174.2 with SMTP id w2mr2415221wae.195.1226920260630; Mon, 17 Nov 2008 03:11:00 -0800 (PST) Received: by 10.115.76.12 with HTTP; Mon, 17 Nov 2008 03:11:00 -0800 (PST) Message-ID: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> Date: Mon, 17 Nov 2008 19:11:00 +0800 From: "Archimedes Gaviola" To: "John Baldwin" In-Reply-To: <200811131128.55220.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> Cc: freebsd-smp@freebsd.org Subject: Re: CPU affinity with ULE scheduler X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 11:11:01 -0000 On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: > On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: >> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: >> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: >> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >> >> >> To Whom It May Concerned: >> >> >> >> >> >> Can someone explain or share about ULE scheduler (latest version 2 if >> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing >> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >> >> >> scheduler and as what I have observed especially on processing high >> >> >> network load traffic on multiple CPU cores, only one CPU were being >> >> >> stressed with network interrupt while the rests are mostly in idle >> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the >> >> >> case. >> >> > >> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on >> > the >> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going > to >> > end >> >> > up handling all the interrupts for bce0 and bce1. This not something > ULE >> > or >> >> > 4BSD have any control over. >> >> > >> >> > -- >> >> > John Baldwin >> >> > >> >> >> >> Hi John, >> >> >> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. >> >> >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: > cpu0 >> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: > cpu2 >> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: > cpu3 >> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: > cpu4 >> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: > cpu5 >> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: > cpu1 >> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: > cpu6 >> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% >> >> irq23: bce0 bce1 >> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: > cpu7 >> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% > pagezero >> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd >> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd >> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: >> > clock s >> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: > net >> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd >> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd >> >> >> >> Actually I was doing a network performance testing on this system with >> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a >> >> tool to generate big amount of traffic around 600Mbps-700Mbps >> >> traversing the FreeBSD system in bi-direction, meaning both network >> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) >> >> that handles the (irq 23) on both interfaces consumed big amount of >> >> CPU utilization around 65.53% in which it affects other running >> >> applications and services like sshd and httpd. It's no longer >> >> accessible when traffic is bombarded. With the current situation of my >> >> FreeBSD system with only one CPU being stressed, I was thinking of >> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought >> >> my concern has something to do with the distributions of load on >> >> multiple CPU cores handled by the scheduler especially at the network >> >> level, processing network load. So, if it is more of interrupt >> >> handling and not on the scheduler, is there a way we can optimize it? >> >> Because if it still routed only to one CPU then for me it's still >> >> inefficient. Who handles interrupt scheduling for bounding CPU in >> >> order to prevent shared IRQ? Is there any improvements with >> >> FreeBSD-7.0 with regards to interrupt handling? >> > >> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both >> > hardwired to the same interrupt pin and so they will always share the same >> > ithread when using the legacy INTx interrupts. However, bce(4) parts do >> > support MSI, and if you try a newer OS snap (6.3 or later) these devices >> > should use MSI in which case each NIC would be assigned to a separate CPU. > I >> > would suggest trying 7.0 or a 7.1 release candidate and see if it does >> > better. >> > >> > -- >> > John Baldwin >> > >> >> Hi John, >> >> I try 7.0 release and each network interface were already allocated >> separately on different CPU. Here, MSI is already working. >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: > cpu6 >> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: > cpu3 >> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: > cpu4 >> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: > cpu2 >> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: > cpu1 >> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: > bce0 >> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 >> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: > bce1 >> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 >> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 >> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: > clock s >> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd >> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow >> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: > Giant t >> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd >> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd >> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: > atkbd0 >> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down >> >> The bce0 interface interrupt (irq256) gets stressed out which already >> have 100% of CPU7 while CPU0 is around 51.17%. Any more >> recommendations? Is there anything we can do about optimization with >> MSI? > > Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it > seems you are hammering your bce0 interface. You might want to try using > polling on bce0 and seeing if it keeps up with the traffic better. > > -- > John Baldwin > With net.isr.direct=0, my IBM system lessens CPU utilization per interface (bce0 and bce1) but swi1:net increase its utilization. Can you explained what's happening here? What does net.isr.direct do with the decrease of CPU utilization on its interface? I really wanted to know what happened internally during the packets being processed and received by the interfaces then to the device interrupt up to the software interrupt level because I am confused when enabling/disabling net.isr.direct in sysctl. Is there a tool that can we used to trace this process just to be able to know which part of the kernel internal is doing the bottleneck especially when net.isr.direct=1? By the way with device polling enabled, the system experienced packet errors and the interface throughput is worst, so I avoid using it though. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 Regards, Archimedes From owner-freebsd-smp@FreeBSD.ORG Mon Nov 17 11:36:41 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 800FF1065673 for ; Mon, 17 Nov 2008 11:36:41 +0000 (UTC) (envelope-from archimedes.gaviola@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.179]) by mx1.freebsd.org (Postfix) with ESMTP id 519CD8FC23 for ; Mon, 17 Nov 2008 11:36:41 +0000 (UTC) (envelope-from archimedes.gaviola@gmail.com) Received: by wa-out-1112.google.com with SMTP id m34so1311452wag.27 for ; Mon, 17 Nov 2008 03:36:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=jIzzl4DLAEPTzJU02EzRDyuAEsm7Nd+/CAxlg1Ahg+M=; b=GspAKjyFRLCc6G6IEnbYLDqRa9dXNG7iKCtidCFx/tMZhOLxzDod+jhxte1ahHhS7i VF+qw7r/9TFZPoYCPG8croUUxe7AHWl+qzUQkTjIJe/zP+FN5XvoQxBlLEC2m3j4siNu ul6P9/8u/6vQLgBetkF6dGLCvGnbZRsHB9e2o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=ipflqQFUuAw5ugQA5Ll6g+ep3hY/sBt4FLuWFdure3NBD0NJJaJhLv0WNkYt2TgBoE h3+NcQjuLVg2oVfLiVODCN3/yUchLI+44GFPAyGY6LT+73HqzCZaI/KX7hfr4PUWUd2i AUZ1VWPMoqzX3tXPrLWFq+2HdhPMnehE9jR3c= Received: by 10.114.39.5 with SMTP id m5mr2427993wam.214.1226921801032; Mon, 17 Nov 2008 03:36:41 -0800 (PST) Received: by 10.115.76.12 with HTTP; Mon, 17 Nov 2008 03:36:40 -0800 (PST) Message-ID: <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> Date: Mon, 17 Nov 2008 19:36:40 +0800 From: "Archimedes Gaviola" To: "John Baldwin" In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> Cc: freebsd-smp@freebsd.org Subject: Re: CPU affinity with ULE scheduler X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 11:36:41 -0000 On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola wrote: > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: >> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: >>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: >>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: >>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: >>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: >>> >> >> To Whom It May Concerned: >>> >> >> >>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if >>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing >>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD >>> >> >> scheduler and as what I have observed especially on processing high >>> >> >> network load traffic on multiple CPU cores, only one CPU were being >>> >> >> stressed with network interrupt while the rests are mostly in idle >>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom >>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the >>> >> >> case. >>> >> > >>> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on >>> > the >>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going >> to >>> > end >>> >> > up handling all the interrupts for bce0 and bce1. This not something >> ULE >>> > or >>> >> > 4BSD have any control over. >>> >> > >>> >> > -- >>> >> > John Baldwin >>> >> > >>> >> >>> >> Hi John, >>> >> >>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. >>> >> >>> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >>> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: >> cpu0 >>> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: >> cpu2 >>> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: >> cpu3 >>> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: >> cpu4 >>> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: >> cpu5 >>> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: >> cpu1 >>> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: >> cpu6 >>> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% >>> >> irq23: bce0 bce1 >>> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: >> cpu7 >>> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% >> pagezero >>> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd >>> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd >>> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: >>> > clock s >>> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: >> net >>> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd >>> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd >>> >> >>> >> Actually I was doing a network performance testing on this system with >>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a >>> >> tool to generate big amount of traffic around 600Mbps-700Mbps >>> >> traversing the FreeBSD system in bi-direction, meaning both network >>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) >>> >> that handles the (irq 23) on both interfaces consumed big amount of >>> >> CPU utilization around 65.53% in which it affects other running >>> >> applications and services like sshd and httpd. It's no longer >>> >> accessible when traffic is bombarded. With the current situation of my >>> >> FreeBSD system with only one CPU being stressed, I was thinking of >>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought >>> >> my concern has something to do with the distributions of load on >>> >> multiple CPU cores handled by the scheduler especially at the network >>> >> level, processing network load. So, if it is more of interrupt >>> >> handling and not on the scheduler, is there a way we can optimize it? >>> >> Because if it still routed only to one CPU then for me it's still >>> >> inefficient. Who handles interrupt scheduling for bounding CPU in >>> >> order to prevent shared IRQ? Is there any improvements with >>> >> FreeBSD-7.0 with regards to interrupt handling? >>> > >>> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both >>> > hardwired to the same interrupt pin and so they will always share the same >>> > ithread when using the legacy INTx interrupts. However, bce(4) parts do >>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices >>> > should use MSI in which case each NIC would be assigned to a separate CPU. >> I >>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does >>> > better. >>> > >>> > -- >>> > John Baldwin >>> > >>> >>> Hi John, >>> >>> I try 7.0 release and each network interface were already allocated >>> separately on different CPU. Here, MSI is already working. >>> >>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND >>> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: >> cpu6 >>> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: >> cpu3 >>> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: >> cpu4 >>> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: >> cpu2 >>> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: >> cpu1 >>> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: >> bce0 >>> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 >>> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: >> bce1 >>> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 >>> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 >>> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: >> clock s >>> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd >>> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow >>> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: >> Giant t >>> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd >>> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd >>> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: >> atkbd0 >>> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down >>> >>> The bce0 interface interrupt (irq256) gets stressed out which already >>> have 100% of CPU7 while CPU0 is around 51.17%. Any more >>> recommendations? Is there anything we can do about optimization with >>> MSI? >> >> Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it >> seems you are hammering your bce0 interface. You might want to try using >> polling on bce0 and seeing if it keeps up with the traffic better. >> >> -- >> John Baldwin >> > > With net.isr.direct=0, my IBM system lessens CPU utilization per > interface (bce0 and bce1) but swi1:net increase its utilization. > Can you explained what's happening here? What does net.isr.direct do > with the decrease of CPU utilization on its interface? I really wanted > to know what happened internally during the packets being processed > and received by the interfaces then to the device interrupt up to the > software interrupt level because I am confused when enabling/disabling > net.isr.direct in sysctl. Is there a tool that can we used to trace > this process just to be able to know which part of the kernel internal > is doing the bottleneck especially when net.isr.direct=1? By the way > with device polling enabled, the system experienced packet errors and > the interface throughput is worst, so I avoid using it though. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 > 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net > 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 > 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 > 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 > 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 > > > Regards, > Archimedes > One more thing, I observed that when net.isr.direct=1, bce0 is using irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now using irq31 and bce1 is using irq32. What makes it different? From owner-freebsd-smp@FreeBSD.ORG Mon Nov 17 13:18:02 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48EDA1065673 for ; Mon, 17 Nov 2008 13:18:02 +0000 (UTC) (envelope-from freebsd-smp@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id C394F8FC1A for ; Mon, 17 Nov 2008 13:18:01 +0000 (UTC) (envelope-from freebsd-smp@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1L23zK-0005Rg-CP for freebsd-smp@freebsd.org; Mon, 17 Nov 2008 13:17:58 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Nov 2008 13:17:58 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Nov 2008 13:17:58 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-smp@freebsd.org From: Ivan Voras Date: Mon, 17 Nov 2008 14:18:36 +0100 Lines: 43 Message-ID: References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811111216.37462.jhb@freebsd.org> <42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig72EB5AC1004F9F67F4407EC7" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.17 (X11/20080925) In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> X-Enigmail-Version: 0.95.0 Sender: news Subject: Re: CPU affinity with ULE scheduler X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 13:18:02 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig72EB5AC1004F9F67F4407EC7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Archimedes Gaviola wrote: > With net.isr.direct=3D0, my IBM system lessens CPU utilization per > interface (bce0 and bce1) but swi1:net increase its utilization. > Can you explained what's happening here? What does net.isr.direct do > with the decrease of CPU utilization on its interface?=20 The system has a choice between processing the packets in the interrupt handler (the "irq:bce" process) or in a dedicated network process (the "swi:net" process). This is about protocol handling not simply receiving packets. With net.isr.direct you're toggling between those two options. If "direct" is 1, the packets are processed in the interrupt handler; if it's 0, the processing is delegated to swi. It's set to 1 by default because this setting should yield best latency. In both cases the code path a packet must go through is very similar: it has to be received, then processed through firewalls and network stack code, then delivered to application(s), so it's a serial process. There are things that could be better parallelized in the stack and people are working on them, but they will not be finished any time soon. --------------enig72EB5AC1004F9F67F4407EC7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJIW8sldnAQVacBcgRAnPQAKC+5qlyAtI+mTT5eFP4te2BX8EWXgCg+REw Ff9Lv7GNlBhrtGNsp9Ojkss= =0AJK -----END PGP SIGNATURE----- --------------enig72EB5AC1004F9F67F4407EC7-- From owner-freebsd-smp@FreeBSD.ORG Mon Nov 17 21:13:50 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F8DD1065677 for ; Mon, 17 Nov 2008 21:13:50 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id A4D738FC14 for ; Mon, 17 Nov 2008 21:13:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.3/8.14.3) with ESMTP id mAHLDgur033788; Mon, 17 Nov 2008 16:13:42 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Archimedes Gaviola" Date: Mon, 17 Nov 2008 16:09:15 -0500 User-Agent: KMail/1.9.7 References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <200811131128.55220.jhb@freebsd.org> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200811171609.15913.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Mon, 17 Nov 2008 16:13:43 -0500 (EST) X-Virus-Scanned: ClamAV 0.93.1/8642/Sun Nov 16 23:01:08 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-smp@freebsd.org Subject: Re: CPU affinity with ULE scheduler X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 21:13:50 -0000 On Monday 17 November 2008 06:11:00 am Archimedes Gaviola wrote: > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: > > On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: > >> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > >> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: > >> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > >> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > >> >> >> To Whom It May Concerned: > >> >> >> > >> >> >> Can someone explain or share about ULE scheduler (latest version 2 if > >> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing > >> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > >> >> >> scheduler and as what I have observed especially on processing high > >> >> >> network load traffic on multiple CPU cores, only one CPU were being > >> >> >> stressed with network interrupt while the rests are mostly in idle > >> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > >> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the > >> >> >> case. > >> >> > > >> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > >> > the > >> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going > > to > >> > end > >> >> > up handling all the interrupts for bce0 and bce1. This not something > > ULE > >> > or > >> >> > 4BSD have any control over. > >> >> > > >> >> > -- > >> >> > John Baldwin > >> >> > > >> >> > >> >> Hi John, > >> >> > >> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. > >> >> > >> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: > > cpu0 > >> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: > > cpu2 > >> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: > > cpu3 > >> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: > > cpu4 > >> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: > > cpu5 > >> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: > > cpu1 > >> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: > > cpu6 > >> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > >> >> irq23: bce0 bce1 > >> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: > > cpu7 > >> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% > > pagezero > >> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > >> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > >> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > >> > clock s > >> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: > > net > >> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > >> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > >> >> > >> >> Actually I was doing a network performance testing on this system with > >> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > >> >> tool to generate big amount of traffic around 600Mbps-700Mbps > >> >> traversing the FreeBSD system in bi-direction, meaning both network > >> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) > >> >> that handles the (irq 23) on both interfaces consumed big amount of > >> >> CPU utilization around 65.53% in which it affects other running > >> >> applications and services like sshd and httpd. It's no longer > >> >> accessible when traffic is bombarded. With the current situation of my > >> >> FreeBSD system with only one CPU being stressed, I was thinking of > >> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > >> >> my concern has something to do with the distributions of load on > >> >> multiple CPU cores handled by the scheduler especially at the network > >> >> level, processing network load. So, if it is more of interrupt > >> >> handling and not on the scheduler, is there a way we can optimize it? > >> >> Because if it still routed only to one CPU then for me it's still > >> >> inefficient. Who handles interrupt scheduling for bounding CPU in > >> >> order to prevent shared IRQ? Is there any improvements with > >> >> FreeBSD-7.0 with regards to interrupt handling? > >> > > >> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > >> > hardwired to the same interrupt pin and so they will always share the same > >> > ithread when using the legacy INTx interrupts. However, bce(4) parts do > >> > support MSI, and if you try a newer OS snap (6.3 or later) these devices > >> > should use MSI in which case each NIC would be assigned to a separate CPU. > > I > >> > would suggest trying 7.0 or a 7.1 release candidate and see if it does > >> > better. > >> > > >> > -- > >> > John Baldwin > >> > > >> > >> Hi John, > >> > >> I try 7.0 release and each network interface were already allocated > >> separately on different CPU. Here, MSI is already working. > >> > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: > > cpu6 > >> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: > > cpu3 > >> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: > > cpu4 > >> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: > > cpu2 > >> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: > > cpu1 > >> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: > > bce0 > >> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 > >> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: > > bce1 > >> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 > >> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 > >> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: > > clock s > >> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd > >> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow > >> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: > > Giant t > >> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd > >> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd > >> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: > > atkbd0 > >> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down > >> > >> The bce0 interface interrupt (irq256) gets stressed out which already > >> have 100% of CPU7 while CPU0 is around 51.17%. Any more > >> recommendations? Is there anything we can do about optimization with > >> MSI? > > > > Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it > > seems you are hammering your bce0 interface. You might want to try using > > polling on bce0 and seeing if it keeps up with the traffic better. > > > > -- > > John Baldwin > > > > With net.isr.direct=0, my IBM system lessens CPU utilization per > interface (bce0 and bce1) but swi1:net increase its utilization. > Can you explained what's happening here? What does net.isr.direct do > with the decrease of CPU utilization on its interface? I really wanted > to know what happened internally during the packets being processed > and received by the interfaces then to the device interrupt up to the > software interrupt level because I am confused when enabling/disabling > net.isr.direct in sysctl. Is there a tool that can we used to trace > this process just to be able to know which part of the kernel internal > is doing the bottleneck especially when net.isr.direct=1? By the way > with device polling enabled, the system experienced packet errors and > the interface throughput is worst, so I avoid using it though. > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 > 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net > 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 > 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 > 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 > 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 With net.isr.direct=1, the ithread tries to pass the received packets up to IP/UDP/TCP/socket directly. With net.isr.direct=0, the ithread places received packets on a queue and sends a signal to 'sw1: net'. The swi thread wakes up, pulls the packets off of the queue and sends them to IP/UDP/TCP/socket. -- John Baldwin From owner-freebsd-smp@FreeBSD.ORG Mon Nov 17 21:13:57 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 083691065678 for ; Mon, 17 Nov 2008 21:13:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id DD8398FC19 for ; Mon, 17 Nov 2008 21:13:55 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.3/8.14.3) with ESMTP id mAHLDgus033788; Mon, 17 Nov 2008 16:13:48 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Archimedes Gaviola" Date: Mon, 17 Nov 2008 16:09:54 -0500 User-Agent: KMail/1.9.7 References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com> <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com> <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> In-Reply-To: <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200811171609.54527.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Mon, 17 Nov 2008 16:13:49 -0500 (EST) X-Virus-Scanned: ClamAV 0.93.1/8642/Sun Nov 16 23:01:08 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-smp@freebsd.org Subject: Re: CPU affinity with ULE scheduler X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Nov 2008 21:13:57 -0000 On Monday 17 November 2008 06:36:40 am Archimedes Gaviola wrote: > On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola > wrote: > > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin wrote: > >> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote: > >>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin wrote: > >>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote: > >>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin wrote: > >>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote: > >>> >> >> To Whom It May Concerned: > >>> >> >> > >>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if > >>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing > >>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD > >>> >> >> scheduler and as what I have observed especially on processing high > >>> >> >> network load traffic on multiple CPU cores, only one CPU were being > >>> >> >> stressed with network interrupt while the rests are mostly in idle > >>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom > >>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the > >>> >> >> case. > >>> >> > > >>> >> > Interrupts are routed to a single CPU. Since bce0 and bce1 are both on > >>> > the > >>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going > >> to > >>> > end > >>> >> > up handling all the interrupts for bce0 and bce1. This not something > >> ULE > >>> > or > >>> >> > 4BSD have any control over. > >>> >> > > >>> >> > -- > >>> >> > John Baldwin > >>> >> > > >>> >> > >>> >> Hi John, > >>> >> > >>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern. > >>> >> > >>> >> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >>> >> 17 root 1 171 52 0K 16K CPU0 0 54:28 95.17% idle: > >> cpu0 > >>> >> 15 root 1 171 52 0K 16K CPU2 2 55:55 93.65% idle: > >> cpu2 > >>> >> 14 root 1 171 52 0K 16K CPU3 3 58:53 93.55% idle: > >> cpu3 > >>> >> 13 root 1 171 52 0K 16K RUN 4 59:14 82.47% idle: > >> cpu4 > >>> >> 12 root 1 171 52 0K 16K RUN 5 55:42 82.23% idle: > >> cpu5 > >>> >> 16 root 1 171 52 0K 16K CPU1 1 58:13 77.78% idle: > >> cpu1 > >>> >> 11 root 1 171 52 0K 16K CPU6 6 54:08 76.17% idle: > >> cpu6 > >>> >> 36 root 1 -68 -187 0K 16K WAIT 7 8:50 65.53% > >>> >> irq23: bce0 bce1 > >>> >> 10 root 1 171 52 0K 16K CPU7 7 48:19 29.79% idle: > >> cpu7 > >>> >> 43 root 1 171 52 0K 16K pgzero 2 0:35 1.51% > >> pagezero > >>> >> 1372 root 10 20 0 16716K 5764K kserel 6 58:42 0.00% kmd > >>> >> 4488 root 1 96 0 30676K 4236K select 2 1:51 0.00% sshd > >>> >> 18 root 1 -32 -151 0K 16K WAIT 0 1:14 0.00% swi4: > >>> > clock s > >>> >> 20 root 1 -44 -163 0K 16K WAIT 0 0:30 0.00% swi1: > >> net > >>> >> 218 root 1 96 0 3852K 1376K select 0 0:23 0.00% syslogd > >>> >> 2171 root 1 96 0 30676K 4224K select 6 0:19 0.00% sshd > >>> >> > >>> >> Actually I was doing a network performance testing on this system with > >>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a > >>> >> tool to generate big amount of traffic around 600Mbps-700Mbps > >>> >> traversing the FreeBSD system in bi-direction, meaning both network > >>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7) > >>> >> that handles the (irq 23) on both interfaces consumed big amount of > >>> >> CPU utilization around 65.53% in which it affects other running > >>> >> applications and services like sshd and httpd. It's no longer > >>> >> accessible when traffic is bombarded. With the current situation of my > >>> >> FreeBSD system with only one CPU being stressed, I was thinking of > >>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought > >>> >> my concern has something to do with the distributions of load on > >>> >> multiple CPU cores handled by the scheduler especially at the network > >>> >> level, processing network load. So, if it is more of interrupt > >>> >> handling and not on the scheduler, is there a way we can optimize it? > >>> >> Because if it still routed only to one CPU then for me it's still > >>> >> inefficient. Who handles interrupt scheduling for bounding CPU in > >>> >> order to prevent shared IRQ? Is there any improvements with > >>> >> FreeBSD-7.0 with regards to interrupt handling? > >>> > > >>> > It depends. In all likelihood, the interrupts from bce0 and bce1 are both > >>> > hardwired to the same interrupt pin and so they will always share the same > >>> > ithread when using the legacy INTx interrupts. However, bce(4) parts do > >>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices > >>> > should use MSI in which case each NIC would be assigned to a separate CPU. > >> I > >>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does > >>> > better. > >>> > > >>> > -- > >>> > John Baldwin > >>> > > >>> > >>> Hi John, > >>> > >>> I try 7.0 release and each network interface were already allocated > >>> separately on different CPU. Here, MSI is already working. > >>> > >>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > >>> 12 root 1 171 ki31 0K 16K CPU6 6 123:55 100.00% idle: > >> cpu6 > >>> 15 root 1 171 ki31 0K 16K CPU3 3 123:54 100.00% idle: > >> cpu3 > >>> 14 root 1 171 ki31 0K 16K CPU4 4 123:26 100.00% idle: > >> cpu4 > >>> 16 root 1 171 ki31 0K 16K CPU2 2 123:15 100.00% idle: > >> cpu2 > >>> 17 root 1 171 ki31 0K 16K CPU1 1 123:15 100.00% idle: > >> cpu1 > >>> 37 root 1 -68 - 0K 16K CPU7 7 9:09 100.00% irq256: > >> bce0 > >>> 13 root 1 171 ki31 0K 16K CPU5 5 123:49 99.07% idle: cpu5 > >>> 40 root 1 -68 - 0K 16K WAIT 0 4:40 51.17% irq257: > >> bce1 > >>> 18 root 1 171 ki31 0K 16K RUN 0 117:48 49.37% idle: cpu0 > >>> 11 root 1 171 ki31 0K 16K RUN 7 115:25 0.00% idle: cpu7 > >>> 19 root 1 -32 - 0K 16K WAIT 0 0:39 0.00% swi4: > >> clock s > >>> 14367 root 1 44 0 5176K 3104K select 2 0:01 0.00% dhcpd > >>> 22 root 1 -16 - 0K 16K - 3 0:01 0.00% yarrow > >>> 25 root 1 -24 - 0K 16K WAIT 0 0:00 0.00% swi6: > >> Giant t > >>> 11658 root 1 44 0 32936K 4540K select 1 0:00 0.00% sshd > >>> 14224 root 1 44 0 32936K 4540K select 5 0:00 0.00% sshd > >>> 41 root 1 -60 - 0K 16K WAIT 0 0:00 0.00% irq1: > >> atkbd0 > >>> 4 root 1 -8 - 0K 16K - 2 0:00 0.00% g_down > >>> > >>> The bce0 interface interrupt (irq256) gets stressed out which already > >>> have 100% of CPU7 while CPU0 is around 51.17%. Any more > >>> recommendations? Is there anything we can do about optimization with > >>> MSI? > >> > >> Well, on 7.x you can try turning net.isr.direct off (sysctl). However, it > >> seems you are hammering your bce0 interface. You might want to try using > >> polling on bce0 and seeing if it keeps up with the traffic better. > >> > >> -- > >> John Baldwin > >> > > > > With net.isr.direct=0, my IBM system lessens CPU utilization per > > interface (bce0 and bce1) but swi1:net increase its utilization. > > Can you explained what's happening here? What does net.isr.direct do > > with the decrease of CPU utilization on its interface? I really wanted > > to know what happened internally during the packets being processed > > and received by the interfaces then to the device interrupt up to the > > software interrupt level because I am confused when enabling/disabling > > net.isr.direct in sysctl. Is there a tool that can we used to trace > > this process just to be able to know which part of the kernel internal > > is doing the bottleneck especially when net.isr.direct=1? By the way > > with device polling enabled, the system experienced packet errors and > > the interface throughput is worst, so I avoid using it though. > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > > > 16 root 1 171 ki31 0K 16K CPU10 a 86:06 89.06% idle: cpu10 > > 27 root 1 -44 - 0K 16K CPU1 1 34:37 82.67% swi1: net > > 52 root 1 -68 - 0K 16K WAIT b 51:59 59.77% irq32: bce1 > > 15 root 1 171 ki31 0K 16K RUN b 69:28 43.16% idle: cpu11 > > 25 root 1 171 ki31 0K 16K RUN 1 115:35 24.27% idle: cpu1 > > 51 root 1 -68 - 0K 16K CPU10 a 35:21 13.48% irq31: bce0 > > > > > > Regards, > > Archimedes > > > > One more thing, I observed that when net.isr.direct=1, bce0 is using > irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now > using irq31 and bce1 is using irq32. What makes it different? That is not from net.isr.direcct. irq256/257 is when the bce devices are using MSI. irq31/32 is when the bce devices are using INTx. -- John Baldwin From owner-freebsd-smp@FreeBSD.ORG Wed Nov 19 11:42:02 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC6F0106564A; Wed, 19 Nov 2008 11:42:02 +0000 (UTC) (envelope-from takawata@init-main.com) Received: from sana.init-main.com (unknown [IPv6:2001:240:28::1]) by mx1.freebsd.org (Postfix) with ESMTP id 761798FC17; Wed, 19 Nov 2008 11:42:02 +0000 (UTC) (envelope-from takawata@init-main.com) Received: from init-main.com (localhost [127.0.0.1]) by sana.init-main.com (8.14.3/8.14.3) with ESMTP id mAJBi3Lg004559; Wed, 19 Nov 2008 20:44:03 +0900 (JST) (envelope-from takawata@init-main.com) Message-Id: <200811191144.mAJBi3Lg004559@sana.init-main.com> To: freebsd-current@freebsd.org, freebsd-hackers@freebsd.org, freebsd-smp@freebsd.org Date: Wed, 19 Nov 2008 20:44:03 +0900 From: Takanori Watanabe Cc: Subject: Core i7 anyone else? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2008 11:42:03 -0000 Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500) and sometimes hangs up oddly. When in the state, some specific process only works and replys ping, but not reply any useful information. I suspect it may caused by CPU power management, so I cut almost all CPU power management feature on BIOS parameter. Are there any people encouterd such trouble? And on this machine build world in SCHED_ULE(15min.) is slower than SCHED_4BSD(12min.). ===dmesg=== http://www.init-main.com/corei7.dmesg or http://pastebin.com/m187f77aa (if host is down) =====DSDT==== http://www.init-main.com/corei7.asl or http://pastebin.com/m6879984a ==some sysctls== hw.machine: i386 hw.model: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz hw.ncpu: 8 hw.byteorder: 1234 hw.physmem: 3202322432 hw.usermem: 2956083200 hw.pagesize: 4096 hw.floatingpoint: 1 hw.machine_arch: i386 hw.realmem: 3211264000 == machdep.enable_panic_key: 0 machdep.adjkerntz: -32400 machdep.wall_cmos_clock: 1 machdep.disable_rtc_set: 0 machdep.disable_mtrrs: 0 machdep.guessed_bootdev: 2686451712 machdep.idle: acpi machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi, machdep.hlt_cpus: 0 machdep.prot_fault_translation: 0 machdep.panic_on_nmi: 1 machdep.kdb_on_nmi: 1 machdep.tsc_freq: 2684011396 machdep.i8254_freq: 1193182 machdep.acpi_timer_freq: 3579545 machdep.acpi_root: 1024240 machdep.hlt_logical_cpus: 0 machdep.logical_cpus_mask: 254 machdep.hyperthreading_allowed: 1 == kern.sched.preemption: 0 kern.sched.topology_spec: 0, 1, 2, 3, 4, 5, 6, 7 kern.sched.steal_thresh: 3 kern.sched.steal_idle: 1 kern.sched.steal_htt: 1 kern.sched.balance_interval: 133 kern.sched.balance: 1 kern.sched.affinity: 1 kern.sched.idlespinthresh: 4 kern.sched.idlespins: 10000 kern.sched.static_boost: 160 kern.sched.preempt_thresh: 0 kern.sched.interact: 30 kern.sched.slice: 13 kern.sched.name: ULE === From owner-freebsd-smp@FreeBSD.ORG Wed Nov 19 11:57:57 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C9411065672 for ; Wed, 19 Nov 2008 11:57:57 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA03.westchester.pa.mail.comcast.net (qmta03.westchester.pa.mail.comcast.net [76.96.62.32]) by mx1.freebsd.org (Postfix) with ESMTP id 422608FC1E for ; Wed, 19 Nov 2008 11:57:56 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA08.westchester.pa.mail.comcast.net ([76.96.62.12]) by QMTA03.westchester.pa.mail.comcast.net with comcast id gzjh1a00N0Fqzac53zmKKN; Wed, 19 Nov 2008 11:46:19 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA08.westchester.pa.mail.comcast.net with comcast id gznE1a00Q2P6wsM3UznFzA; Wed, 19 Nov 2008 11:47:16 +0000 X-Authority-Analysis: v=1.0 c=1 a=DiZ76wm4AAAA:8 a=fGO4tVQLAAAA:8 a=6I5d2MoRAAAA:8 a=QycZ5dHgAAAA:8 a=8oBBsJxBPgm8k0VuyQAA:9 a=CmO0HorRqYcm4XbtWS8pjeXFcQ8A:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id B30E933C36; Wed, 19 Nov 2008 03:47:14 -0800 (PST) Date: Wed, 19 Nov 2008 03:47:14 -0800 From: Jeremy Chadwick To: Takanori Watanabe Message-ID: <20081119114714.GA85533@icarus.home.lan> References: <200811191144.mAJBi3Lg004559@sana.init-main.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200811191144.mAJBi3Lg004559@sana.init-main.com> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, freebsd-smp@freebsd.org Subject: Re: Core i7 anyone else? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2008 11:57:57 -0000 On Wed, Nov 19, 2008 at 08:44:03PM +0900, Takanori Watanabe wrote: > Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500) > and sometimes hangs up oddly. > When in the state, some specific process only works and > replys ping, but not reply any useful information. > > I suspect it may caused by CPU power management, so I cut > almost all CPU power management feature on BIOS parameter. > > Are there any people encouterd such trouble? > And on this machine build world in SCHED_ULE(15min.) is slower > than SCHED_4BSD(12min.). > > > ===dmesg=== > http://www.init-main.com/corei7.dmesg > or > http://pastebin.com/m187f77aa > (if host is down) > > =====DSDT==== > http://www.init-main.com/corei7.asl > or > http://pastebin.com/m6879984a > > ==some sysctls== > hw.machine: i386 > hw.model: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz > hw.ncpu: 8 > hw.byteorder: 1234 > hw.physmem: 3202322432 > hw.usermem: 2956083200 > hw.pagesize: 4096 > hw.floatingpoint: 1 > hw.machine_arch: i386 > hw.realmem: 3211264000 > == > machdep.enable_panic_key: 0 > machdep.adjkerntz: -32400 > machdep.wall_cmos_clock: 1 > machdep.disable_rtc_set: 0 > machdep.disable_mtrrs: 0 > machdep.guessed_bootdev: 2686451712 > machdep.idle: acpi > machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi, > machdep.hlt_cpus: 0 > machdep.prot_fault_translation: 0 > machdep.panic_on_nmi: 1 > machdep.kdb_on_nmi: 1 > machdep.tsc_freq: 2684011396 > machdep.i8254_freq: 1193182 > machdep.acpi_timer_freq: 3579545 > machdep.acpi_root: 1024240 > machdep.hlt_logical_cpus: 0 > machdep.logical_cpus_mask: 254 > machdep.hyperthreading_allowed: 1 > == > kern.sched.preemption: 0 > kern.sched.topology_spec: > > 0, 1, 2, 3, 4, 5, 6, 7 > > > > > kern.sched.steal_thresh: 3 > kern.sched.steal_idle: 1 > kern.sched.steal_htt: 1 > kern.sched.balance_interval: 133 > kern.sched.balance: 1 > kern.sched.affinity: 1 > kern.sched.idlespinthresh: 4 > kern.sched.idlespins: 10000 > kern.sched.static_boost: 160 > kern.sched.preempt_thresh: 0 > kern.sched.interact: 30 > kern.sched.slice: 13 > kern.sched.name: ULE > === When building world/kernel, do you see odd behaviour (on CURRENT) such as the load average being absurdly high, or processes (anything; sh, make, mutt, etc.) getting stuck in bizarre states? These things are what caused my buildworld/buildkernel times to increase (compared to RELENG_7). I was using ULE entirely (on CURRENT and RELENG_7), but did not try 4BSD. I documented my experience. http://wiki.freebsd.org/JeremyChadwick/Bizarre_CURRENT_experience I have no idea if your problem is the same as mine. This is purely speculative on my part. (And readers of that Wiki article should note that the problem was not hardware-related) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-smp@FreeBSD.ORG Wed Nov 19 12:05:11 2008 Return-Path: Delivered-To: freebsd-smp@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 83ECE106567C for ; Wed, 19 Nov 2008 12:05:11 +0000 (UTC) (envelope-from freebsd-smp@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 043238FC18 for ; Wed, 19 Nov 2008 12:05:10 +0000 (UTC) (envelope-from freebsd-smp@m.gmane.org) Received: from root by ciao.gmane.org with local (Exim 4.43) id 1L2lnr-0005ay-4g for freebsd-smp@freebsd.org; Wed, 19 Nov 2008 12:05:04 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 19 Nov 2008 12:05:03 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 19 Nov 2008 12:05:03 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-smp@freebsd.org From: Ivan Voras Date: Wed, 19 Nov 2008 12:58:54 +0100 Lines: 92 Message-ID: <4923FF7E.1080101@freebsd.org> References: <200811191144.mAJBi3Lg004559@sana.init-main.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigCFC1809BFB1D0993DBF70FC6" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.17 (X11/20080925) In-Reply-To: <200811191144.mAJBi3Lg004559@sana.init-main.com> X-Enigmail-Version: 0.95.0 Sender: news Cc: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, freebsd-smp@freebsd.org Subject: Re: Core i7 anyone else? X-BeenThere: freebsd-smp@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD SMP implementation group List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2008 12:05:11 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigCFC1809BFB1D0993DBF70FC6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Takanori Watanabe wrote: > Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500) > and sometimes hangs up oddly. > When in the state, some specific process only works and=20 > replys ping, but not reply any useful information. >=20 > I suspect it may caused by CPU power management, so I cut=20 > almost all CPU power management feature on BIOS parameter. >=20 > Are there any people encouterd such trouble? > And on this machine build world in SCHED_ULE(15min.) is slower=20 > than SCHED_4BSD(12min.). I don't know but this: > =3D=3D=3Ddmesg=3D=3D=3D > http://www.init-main.com/corei7.dmesg > or > http://pastebin.com/m187f77aa > (if host is down) CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz (2684.00-MHz 686-class CPU) Origin =3D "GenuineIntel" Id =3D 0x106a4 Stepping =3D 4 Features=3D0xbfebfbff Features2=3D0x98e3bd AMD Features=3D0x28100000 AMD Features2=3D0x1 Cores per package: 8 Logical CPUs per core: 2 real memory =3D 3211264000 (3062 MB) avail memory =3D 3143983104 (2998 MB) ACPI APIC Table: <7522MS A7522100> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 is a bit in conflict with this: > kern.sched.topology_spec: > > 0, 1, 2, 3, 4, 5, 6, 7 > > > =46rom what I know of its architecture i7 has hyperthreading - i.e. the CPU has 4 "real" cores which are hyperthreaded, so you get 8 cores total. It probably also includes a different way of enumerating its topology which might have caused wrong topology detection and your slowdown in buildworld. (the CPU also has L3 cache, but I think it's not looked up in topology detection). I don't know it this particular error could be responsible for your lockups - probably not. The CPU also introduces some big changes in power management (dynamic powerdown of individual cores) which could cause them - but I can't help you there. Are you sure it's not something trivial like overheating? --------------enigCFC1809BFB1D0993DBF70FC6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJI/9+ldnAQVacBcgRAptBAKCvy5iMZkVJ7f/v/8jWVRvs0Oa1vwCgnlPY fl3ySAZXU5NXl0ZmOXf43t4= =hTDW -----END PGP SIGNATURE----- --------------enigCFC1809BFB1D0993DBF70FC6--