From owner-freebsd-smp@FreeBSD.ORG  Mon Nov 17 11:11:01 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 685861065678
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 11:11:01 +0000 (UTC)
	(envelope-from archimedes.gaviola@gmail.com)
Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.236])
	by mx1.freebsd.org (Postfix) with ESMTP id 3A05D8FC1B
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 11:11:00 +0000 (UTC)
	(envelope-from archimedes.gaviola@gmail.com)
Received: by rv-out-0506.google.com with SMTP id b25so2282955rvf.43
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 03:11:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to
	:subject:cc:in-reply-to:mime-version:content-type
	:content-transfer-encoding:content-disposition:references;
	bh=fGVHgR0LABWszzv9fduqH285vli6uWOgvFgOhzs4J3Q=;
	b=MuS42+YF1eisZG4dwOyGA15agH5ASE+eZIWNhCd89OFWkJZflXhaE1/Q/yeIOtEs/d
	nYwy9V6nHQ8Y5uUfXWAJrR6C009wMNQ6Yg/t9AtB/yDPoEyliej8hm4xbTtJrxHP3RE+
	Latd+u7MhXF7DLqqv17FOGhIgEU5HFvsJ9AfM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version
	:content-type:content-transfer-encoding:content-disposition
	:references;
	b=Udn6JI+XZq/VnqSTH8pv6xYfZ6UbRTJD4aHGcj7JY2g51CupdupAjPU/JGoMniTbwB
	wuKl01sT7CRNci6PxutKnQM2zQ6+jGoNl/DNAd0e5CzISFmg20DJX9r057baDgM0MWzt
	YqlKr6yoXAlj81PiRSXqzoIXk7i3ypLtydP0M=
Received: by 10.114.174.2 with SMTP id w2mr2415221wae.195.1226920260630;
	Mon, 17 Nov 2008 03:11:00 -0800 (PST)
Received: by 10.115.76.12 with HTTP; Mon, 17 Nov 2008 03:11:00 -0800 (PST)
Message-ID: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
Date: Mon, 17 Nov 2008 19:11:00 +0800
From: "Archimedes Gaviola" <archimedes.gaviola@gmail.com>
To: "John Baldwin" <jhb@freebsd.org>
In-Reply-To: <200811131128.55220.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com>
	<200811111216.37462.jhb@freebsd.org>
	<42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com>
	<200811131128.55220.jhb@freebsd.org>
Cc: freebsd-smp@freebsd.org
Subject: Re: CPU affinity with ULE scheduler
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Nov 2008 11:11:01 -0000

On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@freebsd.org> wrote:
>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@freebsd.org> wrote:
>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>> >> >> To Whom It May Concerned:
>> >> >>
>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if
>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>> >> >> scheduler and as what I have observed especially on processing high
>> >> >> network load traffic on multiple CPU cores, only one CPU were being
>> >> >> stressed with network interrupt while the rests are mostly in idle
>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the
>> >> >> case.
>> >> >
>> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
>> > the
>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going
> to
>> > end
>> >> > up handling all the interrupts for bce0 and bce1.  This not something
> ULE
>> > or
>> >> > 4BSD have any control over.
>> >> >
>> >> > --
>> >> > John Baldwin
>> >> >
>> >>
>> >> Hi John,
>> >>
>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>> >>
>> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle:
> cpu0
>> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle:
> cpu2
>> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle:
> cpu3
>> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle:
> cpu4
>> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle:
> cpu5
>> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle:
> cpu1
>> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle:
> cpu6
>> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
>> >> irq23: bce0 bce1
>> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle:
> cpu7
>> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
> pagezero
>> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
>> > clock s
>> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1:
> net
>> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>> >>
>> >> Actually I was doing a network performance testing on this system with
>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
>> >> tool to generate big amount of traffic around 600Mbps-700Mbps
>> >> traversing the FreeBSD system in bi-direction, meaning both network
>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
>> >> that handles the (irq 23) on both interfaces consumed big amount of
>> >> CPU utilization around 65.53% in which it affects other running
>> >> applications and services like sshd and httpd. It's no longer
>> >> accessible when traffic is bombarded. With the current situation of my
>> >> FreeBSD system with only one CPU being stressed, I was thinking of
>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
>> >> my concern has something to do with the distributions of load on
>> >> multiple CPU cores handled by the scheduler especially at the network
>> >> level, processing network load. So, if it is more of interrupt
>> >> handling and not on the scheduler, is there a way we can optimize it?
>> >> Because if it still routed only to one CPU then for me it's still
>> >> inefficient. Who handles interrupt scheduling for bounding CPU in
>> >> order to prevent shared IRQ? Is there any improvements with
>> >> FreeBSD-7.0 with regards to interrupt handling?
>> >
>> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
>> > hardwired to the same interrupt pin and so they will always share the same
>> > ithread when using the legacy INTx interrupts.  However, bce(4) parts do
>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices
>> > should use MSI in which case each NIC would be assigned to a separate CPU.
> I
>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
>> > better.
>> >
>> > --
>> > John Baldwin
>> >
>>
>> Hi John,
>>
>> I try 7.0 release and each network interface were already allocated
>> separately on different CPU. Here, MSI is already working.
>>
>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
> cpu6
>>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
> cpu3
>>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
> cpu4
>>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
> cpu2
>>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
> cpu1
>>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256:
> bce0
>>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
>>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
> bce1
>>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
>>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
>>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
> clock s
>> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
>>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
>>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
> Giant t
>> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
>> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
>>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
> atkbd0
>>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
>>
>> The bce0 interface interrupt (irq256) gets stressed out which already
>> have 100% of CPU7 while CPU0 is around 51.17%. Any more
>> recommendations? Is there anything we can do about optimization with
>> MSI?
>
> Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
> seems you are hammering your bce0 interface.  You might want to try using
> polling on bce0 and seeing if it keeps up with the traffic better.
>
> --
> John Baldwin
>

With net.isr.direct=0, my IBM system lessens CPU utilization per
interface (bce0 and bce1) but swi1:net increase its utilization.
Can you explained what's happening here? What does net.isr.direct do
with the decrease of CPU utilization on its interface? I really wanted
to know what happened internally during the packets being processed
and received by the interfaces then to the device interrupt up to the
software interrupt level because I am confused when enabling/disabling
net.isr.direct in sysctl. Is there a tool that can we used to trace
this process just to be able to know which part of the kernel internal
is doing the bottleneck especially when net.isr.direct=1? By the way
with device polling enabled, the system experienced packet errors and
the interface throughput is worst, so I avoid using it though.

   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND

   16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle: cpu10
   27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
   52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32: bce1
   15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle: cpu11
   25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: cpu1
   51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31: bce0


Regards,
Archimedes

From owner-freebsd-smp@FreeBSD.ORG  Mon Nov 17 11:36:41 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 800FF1065673
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 11:36:41 +0000 (UTC)
	(envelope-from archimedes.gaviola@gmail.com)
Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.179])
	by mx1.freebsd.org (Postfix) with ESMTP id 519CD8FC23
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 11:36:41 +0000 (UTC)
	(envelope-from archimedes.gaviola@gmail.com)
Received: by wa-out-1112.google.com with SMTP id m34so1311452wag.27
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 03:36:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from:to
	:subject:cc:in-reply-to:mime-version:content-type
	:content-transfer-encoding:content-disposition:references;
	bh=jIzzl4DLAEPTzJU02EzRDyuAEsm7Nd+/CAxlg1Ahg+M=;
	b=GspAKjyFRLCc6G6IEnbYLDqRa9dXNG7iKCtidCFx/tMZhOLxzDod+jhxte1ahHhS7i
	VF+qw7r/9TFZPoYCPG8croUUxe7AHWl+qzUQkTjIJe/zP+FN5XvoQxBlLEC2m3j4siNu
	ul6P9/8u/6vQLgBetkF6dGLCvGnbZRsHB9e2o=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:to:subject:cc:in-reply-to:mime-version
	:content-type:content-transfer-encoding:content-disposition
	:references;
	b=ipflqQFUuAw5ugQA5Ll6g+ep3hY/sBt4FLuWFdure3NBD0NJJaJhLv0WNkYt2TgBoE
	h3+NcQjuLVg2oVfLiVODCN3/yUchLI+44GFPAyGY6LT+73HqzCZaI/KX7hfr4PUWUd2i
	AUZ1VWPMoqzX3tXPrLWFq+2HdhPMnehE9jR3c=
Received: by 10.114.39.5 with SMTP id m5mr2427993wam.214.1226921801032;
	Mon, 17 Nov 2008 03:36:41 -0800 (PST)
Received: by 10.115.76.12 with HTTP; Mon, 17 Nov 2008 03:36:40 -0800 (PST)
Message-ID: <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com>
Date: Mon, 17 Nov 2008 19:36:40 +0800
From: "Archimedes Gaviola" <archimedes.gaviola@gmail.com>
To: "John Baldwin" <jhb@freebsd.org>
In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com>
	<200811111216.37462.jhb@freebsd.org>
	<42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com>
	<200811131128.55220.jhb@freebsd.org>
	<42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
Cc: freebsd-smp@freebsd.org
Subject: Re: CPU affinity with ULE scheduler
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Nov 2008 11:36:41 -0000

On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola
<archimedes.gaviola@gmail.com> wrote:
> On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@freebsd.org> wrote:
>> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
>>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@freebsd.org> wrote:
>>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
>>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@freebsd.org> wrote:
>>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>>> >> >> To Whom It May Concerned:
>>> >> >>
>>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if
>>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
>>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>>> >> >> scheduler and as what I have observed especially on processing high
>>> >> >> network load traffic on multiple CPU cores, only one CPU were being
>>> >> >> stressed with network interrupt while the rests are mostly in idle
>>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the
>>> >> >> case.
>>> >> >
>>> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
>>> > the
>>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going
>> to
>>> > end
>>> >> > up handling all the interrupts for bce0 and bce1.  This not something
>> ULE
>>> > or
>>> >> > 4BSD have any control over.
>>> >> >
>>> >> > --
>>> >> > John Baldwin
>>> >> >
>>> >>
>>> >> Hi John,
>>> >>
>>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>>> >>
>>> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle:
>> cpu0
>>> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle:
>> cpu2
>>> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle:
>> cpu3
>>> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle:
>> cpu4
>>> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle:
>> cpu5
>>> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle:
>> cpu1
>>> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle:
>> cpu6
>>> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
>>> >> irq23: bce0 bce1
>>> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle:
>> cpu7
>>> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
>> pagezero
>>> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>>> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>>> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
>>> > clock s
>>> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1:
>> net
>>> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>>> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>>> >>
>>> >> Actually I was doing a network performance testing on this system with
>>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
>>> >> tool to generate big amount of traffic around 600Mbps-700Mbps
>>> >> traversing the FreeBSD system in bi-direction, meaning both network
>>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
>>> >> that handles the (irq 23) on both interfaces consumed big amount of
>>> >> CPU utilization around 65.53% in which it affects other running
>>> >> applications and services like sshd and httpd. It's no longer
>>> >> accessible when traffic is bombarded. With the current situation of my
>>> >> FreeBSD system with only one CPU being stressed, I was thinking of
>>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
>>> >> my concern has something to do with the distributions of load on
>>> >> multiple CPU cores handled by the scheduler especially at the network
>>> >> level, processing network load. So, if it is more of interrupt
>>> >> handling and not on the scheduler, is there a way we can optimize it?
>>> >> Because if it still routed only to one CPU then for me it's still
>>> >> inefficient. Who handles interrupt scheduling for bounding CPU in
>>> >> order to prevent shared IRQ? Is there any improvements with
>>> >> FreeBSD-7.0 with regards to interrupt handling?
>>> >
>>> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
>>> > hardwired to the same interrupt pin and so they will always share the same
>>> > ithread when using the legacy INTx interrupts.  However, bce(4) parts do
>>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices
>>> > should use MSI in which case each NIC would be assigned to a separate CPU.
>> I
>>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
>>> > better.
>>> >
>>> > --
>>> > John Baldwin
>>> >
>>>
>>> Hi John,
>>>
>>> I try 7.0 release and each network interface were already allocated
>>> separately on different CPU. Here, MSI is already working.
>>>
>>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
>> cpu6
>>>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
>> cpu3
>>>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
>> cpu4
>>>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
>> cpu2
>>>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
>> cpu1
>>>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256:
>> bce0
>>>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
>>>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
>> bce1
>>>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
>>>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
>>>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
>> clock s
>>> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
>>>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
>>>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
>> Giant t
>>> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
>>> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
>>>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
>> atkbd0
>>>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
>>>
>>> The bce0 interface interrupt (irq256) gets stressed out which already
>>> have 100% of CPU7 while CPU0 is around 51.17%. Any more
>>> recommendations? Is there anything we can do about optimization with
>>> MSI?
>>
>> Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
>> seems you are hammering your bce0 interface.  You might want to try using
>> polling on bce0 and seeing if it keeps up with the traffic better.
>>
>> --
>> John Baldwin
>>
>
> With net.isr.direct=0, my IBM system lessens CPU utilization per
> interface (bce0 and bce1) but swi1:net increase its utilization.
> Can you explained what's happening here? What does net.isr.direct do
> with the decrease of CPU utilization on its interface? I really wanted
> to know what happened internally during the packets being processed
> and received by the interfaces then to the device interrupt up to the
> software interrupt level because I am confused when enabling/disabling
> net.isr.direct in sysctl. Is there a tool that can we used to trace
> this process just to be able to know which part of the kernel internal
> is doing the bottleneck especially when net.isr.direct=1? By the way
> with device polling enabled, the system experienced packet errors and
> the interface throughput is worst, so I avoid using it though.
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>
>   16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle: cpu10
>   27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
>   52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32: bce1
>   15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle: cpu11
>   25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: cpu1
>   51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31: bce0
>
>
> Regards,
> Archimedes
>

One more thing, I observed that when net.isr.direct=1, bce0 is using
irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now
using irq31 and bce1 is using irq32. What makes it different?

From owner-freebsd-smp@FreeBSD.ORG  Mon Nov 17 13:18:02 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 48EDA1065673
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 13:18:02 +0000 (UTC)
	(envelope-from freebsd-smp@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id C394F8FC1A
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 13:18:01 +0000 (UTC)
	(envelope-from freebsd-smp@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1L23zK-0005Rg-CP
	for freebsd-smp@freebsd.org; Mon, 17 Nov 2008 13:17:58 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 13:17:58 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 13:17:58 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-smp@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Mon, 17 Nov 2008 14:18:36 +0100
Lines: 43
Message-ID: <gfrqtv$ole$1@ger.gmane.org>
References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com>	<200811111216.37462.jhb@freebsd.org>	<42e3d810811130355x3857bceap447e134b18eee04b@mail.gmail.com>	<200811131128.55220.jhb@freebsd.org>
	<42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig72EB5AC1004F9F67F4407EC7"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.17 (X11/20080925)
In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
X-Enigmail-Version: 0.95.0
Sender: news <news@ger.gmane.org>
Subject: Re: CPU affinity with ULE scheduler
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Nov 2008 13:18:02 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig72EB5AC1004F9F67F4407EC7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Archimedes Gaviola wrote:

> With net.isr.direct=3D0, my IBM system lessens CPU utilization per
> interface (bce0 and bce1) but swi1:net increase its utilization.
> Can you explained what's happening here? What does net.isr.direct do
> with the decrease of CPU utilization on its interface?=20

The system has a choice between processing the packets in the interrupt
handler (the "irq:bce" process) or in a dedicated network process (the
"swi:net" process). This is about protocol handling not simply receiving
packets. With net.isr.direct you're toggling between those two options.
If "direct" is 1, the packets are processed in the interrupt handler; if
it's 0, the processing is delegated to swi. It's set to 1 by default
because this setting should yield best latency.

In both cases the code path a packet must go through is very similar: it
has to be received, then processed through firewalls and network stack
code, then delivered to application(s), so it's a serial process. There
are things that could be better parallelized in the stack and people are
working on them, but they will not be finished any time soon.


--------------enig72EB5AC1004F9F67F4407EC7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJIW8sldnAQVacBcgRAnPQAKC+5qlyAtI+mTT5eFP4te2BX8EWXgCg+REw
Ff9Lv7GNlBhrtGNsp9Ojkss=
=0AJK
-----END PGP SIGNATURE-----

--------------enig72EB5AC1004F9F67F4407EC7--


From owner-freebsd-smp@FreeBSD.ORG  Mon Nov 17 21:13:50 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6F8DD1065677
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 21:13:50 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id A4D738FC14
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 21:13:49 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.3/8.14.3) with ESMTP id mAHLDgur033788;
	Mon, 17 Nov 2008 16:13:42 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: "Archimedes Gaviola" <archimedes.gaviola@gmail.com>
Date: Mon, 17 Nov 2008 16:09:15 -0500
User-Agent: KMail/1.9.7
References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com>
	<200811131128.55220.jhb@freebsd.org>
	<42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
In-Reply-To: <42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200811171609.15913.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]);
	Mon, 17 Nov 2008 16:13:43 -0500 (EST)
X-Virus-Scanned: ClamAV 0.93.1/8642/Sun Nov 16 23:01:08 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00,NO_RELAYS 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: freebsd-smp@freebsd.org
Subject: Re: CPU affinity with ULE scheduler
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Nov 2008 21:13:50 -0000

On Monday 17 November 2008 06:11:00 am Archimedes Gaviola wrote:
> On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@freebsd.org> wrote:
> > On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
> >> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@freebsd.org> wrote:
> >> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
> >> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@freebsd.org> wrote:
> >> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
> >> >> >> To Whom It May Concerned:
> >> >> >>
> >> >> >> Can someone explain or share about ULE scheduler (latest version 2 
if
> >> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
> >> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> >> >> >> scheduler and as what I have observed especially on processing high
> >> >> >> network load traffic on multiple CPU cores, only one CPU were being
> >> >> >> stressed with network interrupt while the rests are mostly in idle
> >> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE 
Broadcom
> >> >> >> network interface cards (bce0 and bce1). Below is the snapshot of 
the
> >> >> >> case.
> >> >> >
> >> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both 
on
> >> > the
> >> >> > same interrupt (irq 23), the CPU that interrupt is routed to is 
going
> > to
> >> > end
> >> >> > up handling all the interrupts for bce0 and bce1.  This not 
something
> > ULE
> >> > or
> >> >> > 4BSD have any control over.
> >> >> >
> >> >> > --
> >> >> > John Baldwin
> >> >> >
> >> >>
> >> >> Hi John,
> >> >>
> >> >> I'm sorry for the wrong snapshot. Here's the right one with my 
concern.
> >> >>
> >> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU 
COMMAND
> >> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% 
idle:
> > cpu0
> >> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% 
idle:
> > cpu2
> >> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% 
idle:
> > cpu3
> >> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% 
idle:
> > cpu4
> >> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% 
idle:
> > cpu5
> >> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% 
idle:
> > cpu1
> >> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% 
idle:
> > cpu6
> >> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> >> >> irq23: bce0 bce1
> >> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% 
idle:
> > cpu7
> >> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
> > pagezero
> >> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
> >> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
> >> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% 
swi4:
> >> > clock s
> >> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% 
swi1:
> > net
> >> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% 
syslogd
> >> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
> >> >>
> >> >> Actually I was doing a network performance testing on this system with
> >> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
> >> >> tool to generate big amount of traffic around 600Mbps-700Mbps
> >> >> traversing the FreeBSD system in bi-direction, meaning both network
> >> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> >> >> that handles the (irq 23) on both interfaces consumed big amount of
> >> >> CPU utilization around 65.53% in which it affects other running
> >> >> applications and services like sshd and httpd. It's no longer
> >> >> accessible when traffic is bombarded. With the current situation of my
> >> >> FreeBSD system with only one CPU being stressed, I was thinking of
> >> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
> >> >> my concern has something to do with the distributions of load on
> >> >> multiple CPU cores handled by the scheduler especially at the network
> >> >> level, processing network load. So, if it is more of interrupt
> >> >> handling and not on the scheduler, is there a way we can optimize it?
> >> >> Because if it still routed only to one CPU then for me it's still
> >> >> inefficient. Who handles interrupt scheduling for bounding CPU in
> >> >> order to prevent shared IRQ? Is there any improvements with
> >> >> FreeBSD-7.0 with regards to interrupt handling?
> >> >
> >> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are 
both
> >> > hardwired to the same interrupt pin and so they will always share the 
same
> >> > ithread when using the legacy INTx interrupts.  However, bce(4) parts 
do
> >> > support MSI, and if you try a newer OS snap (6.3 or later) these 
devices
> >> > should use MSI in which case each NIC would be assigned to a separate 
CPU.
> > I
> >> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
> >> > better.
> >> >
> >> > --
> >> > John Baldwin
> >> >
> >>
> >> Hi John,
> >>
> >> I try 7.0 release and each network interface were already allocated
> >> separately on different CPU. Here, MSI is already working.
> >>
> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
> > cpu6
> >>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
> > cpu3
> >>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
> > cpu4
> >>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
> > cpu2
> >>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
> > cpu1
> >>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% 
irq256:
> > bce0
> >>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: 
cpu5
> >>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
> > bce1
> >>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: 
cpu0
> >>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: 
cpu7
> >>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
> > clock s
> >> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
> >>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
> >>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
> > Giant t
> >> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
> >> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
> >>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
> > atkbd0
> >>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
> >>
> >> The bce0 interface interrupt (irq256) gets stressed out which already
> >> have 100% of CPU7 while CPU0 is around 51.17%. Any more
> >> recommendations? Is there anything we can do about optimization with
> >> MSI?
> >
> > Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
> > seems you are hammering your bce0 interface.  You might want to try using
> > polling on bce0 and seeing if it keeps up with the traffic better.
> >
> > --
> > John Baldwin
> >
> 
> With net.isr.direct=0, my IBM system lessens CPU utilization per
> interface (bce0 and bce1) but swi1:net increase its utilization.
> Can you explained what's happening here? What does net.isr.direct do
> with the decrease of CPU utilization on its interface? I really wanted
> to know what happened internally during the packets being processed
> and received by the interfaces then to the device interrupt up to the
> software interrupt level because I am confused when enabling/disabling
> net.isr.direct in sysctl. Is there a tool that can we used to trace
> this process just to be able to know which part of the kernel internal
> is doing the bottleneck especially when net.isr.direct=1? By the way
> with device polling enabled, the system experienced packet errors and
> the interface throughput is worst, so I avoid using it though.
> 
>    PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> 
>    16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle: 
cpu10
>    27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
>    52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32: 
bce1
>    15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle: 
cpu11
>    25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: cpu1
>    51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31: 
bce0

With net.isr.direct=1, the ithread tries to pass the received packets up to 
IP/UDP/TCP/socket directly.  With net.isr.direct=0, the ithread places 
received packets on a queue and sends a signal to 'sw1: net'.  The swi thread 
wakes up, pulls the packets off of the queue and sends them to 
IP/UDP/TCP/socket.

-- 
John Baldwin

From owner-freebsd-smp@FreeBSD.ORG  Mon Nov 17 21:13:57 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 083691065678
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 21:13:57 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
	[IPv6:2001:470:1f10:75::2])
	by mx1.freebsd.org (Postfix) with ESMTP id DD8398FC19
	for <freebsd-smp@freebsd.org>; Mon, 17 Nov 2008 21:13:55 +0000 (UTC)
	(envelope-from jhb@freebsd.org)
Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1])
	(authenticated bits=0)
	by server.baldwin.cx (8.14.3/8.14.3) with ESMTP id mAHLDgus033788;
	Mon, 17 Nov 2008 16:13:48 -0500 (EST) (envelope-from jhb@freebsd.org)
From: John Baldwin <jhb@freebsd.org>
To: "Archimedes Gaviola" <archimedes.gaviola@gmail.com>
Date: Mon, 17 Nov 2008 16:09:54 -0500
User-Agent: KMail/1.9.7
References: <42e3d810811100033w172e90dbl209ecbab640cc24f@mail.gmail.com>
	<42e3d810811170311uddc77daj176bc285722a0c8@mail.gmail.com>
	<42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com>
In-Reply-To: <42e3d810811170336rf0a0357sf32035e8bd1489e9@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200811171609.54527.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by
	milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]);
	Mon, 17 Nov 2008 16:13:49 -0500 (EST)
X-Virus-Scanned: ClamAV 0.93.1/8642/Sun Nov 16 23:01:08 2008 on
	server.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00,NO_RELAYS 
	autolearn=ham version=3.1.3
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx
Cc: freebsd-smp@freebsd.org
Subject: Re: CPU affinity with ULE scheduler
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Nov 2008 21:13:57 -0000

On Monday 17 November 2008 06:36:40 am Archimedes Gaviola wrote:
> On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola
> <archimedes.gaviola@gmail.com> wrote:
> > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@freebsd.org> wrote:
> >> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
> >>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@freebsd.org> wrote:
> >>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
> >>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@freebsd.org> 
wrote:
> >>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
> >>> >> >> To Whom It May Concerned:
> >>> >> >>
> >>> >> >> Can someone explain or share about ULE scheduler (latest version 2 
if
> >>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
> >>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> >>> >> >> scheduler and as what I have observed especially on processing 
high
> >>> >> >> network load traffic on multiple CPU cores, only one CPU were 
being
> >>> >> >> stressed with network interrupt while the rests are mostly in idle
> >>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE 
Broadcom
> >>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of 
the
> >>> >> >> case.
> >>> >> >
> >>> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are 
both on
> >>> > the
> >>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is 
going
> >> to
> >>> > end
> >>> >> > up handling all the interrupts for bce0 and bce1.  This not 
something
> >> ULE
> >>> > or
> >>> >> > 4BSD have any control over.
> >>> >> >
> >>> >> > --
> >>> >> > John Baldwin
> >>> >> >
> >>> >>
> >>> >> Hi John,
> >>> >>
> >>> >> I'm sorry for the wrong snapshot. Here's the right one with my 
concern.
> >>> >>
> >>> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU 
COMMAND
> >>> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% 
idle:
> >> cpu0
> >>> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% 
idle:
> >> cpu2
> >>> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% 
idle:
> >> cpu3
> >>> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% 
idle:
> >> cpu4
> >>> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% 
idle:
> >> cpu5
> >>> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% 
idle:
> >> cpu1
> >>> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% 
idle:
> >> cpu6
> >>> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> >>> >> irq23: bce0 bce1
> >>> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% 
idle:
> >> cpu7
> >>> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
> >> pagezero
> >>> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
> >>> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% 
sshd
> >>> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% 
swi4:
> >>> > clock s
> >>> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% 
swi1:
> >> net
> >>> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% 
syslogd
> >>> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% 
sshd
> >>> >>
> >>> >> Actually I was doing a network performance testing on this system 
with
> >>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used 
a
> >>> >> tool to generate big amount of traffic around 600Mbps-700Mbps
> >>> >> traversing the FreeBSD system in bi-direction, meaning both network
> >>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> >>> >> that handles the (irq 23) on both interfaces consumed big amount of
> >>> >> CPU utilization around 65.53% in which it affects other running
> >>> >> applications and services like sshd and httpd. It's no longer
> >>> >> accessible when traffic is bombarded. With the current situation of 
my
> >>> >> FreeBSD system with only one CPU being stressed, I was thinking of
> >>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I 
thought
> >>> >> my concern has something to do with the distributions of load on
> >>> >> multiple CPU cores handled by the scheduler especially at the network
> >>> >> level, processing network load. So, if it is more of interrupt
> >>> >> handling and not on the scheduler, is there a way we can optimize it?
> >>> >> Because if it still routed only to one CPU then for me it's still
> >>> >> inefficient. Who handles interrupt scheduling for bounding CPU in
> >>> >> order to prevent shared IRQ? Is there any improvements with
> >>> >> FreeBSD-7.0 with regards to interrupt handling?
> >>> >
> >>> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are 
both
> >>> > hardwired to the same interrupt pin and so they will always share the 
same
> >>> > ithread when using the legacy INTx interrupts.  However, bce(4) parts 
do
> >>> > support MSI, and if you try a newer OS snap (6.3 or later) these 
devices
> >>> > should use MSI in which case each NIC would be assigned to a separate 
CPU.
> >> I
> >>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
> >>> > better.
> >>> >
> >>> > --
> >>> > John Baldwin
> >>> >
> >>>
> >>> Hi John,
> >>>
> >>> I try 7.0 release and each network interface were already allocated
> >>> separately on different CPU. Here, MSI is already working.
> >>>
> >>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU 
COMMAND
> >>>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
> >> cpu6
> >>>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
> >> cpu3
> >>>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
> >> cpu4
> >>>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
> >> cpu2
> >>>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
> >> cpu1
> >>>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% 
irq256:
> >> bce0
> >>>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: 
cpu5
> >>>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% 
irq257:
> >> bce1
> >>>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: 
cpu0
> >>>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: 
cpu7
> >>>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
> >> clock s
> >>> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
> >>>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
> >>>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
> >> Giant t
> >>> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
> >>> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
> >>>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
> >> atkbd0
> >>>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
> >>>
> >>> The bce0 interface interrupt (irq256) gets stressed out which already
> >>> have 100% of CPU7 while CPU0 is around 51.17%. Any more
> >>> recommendations? Is there anything we can do about optimization with
> >>> MSI?
> >>
> >> Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, 
it
> >> seems you are hammering your bce0 interface.  You might want to try using
> >> polling on bce0 and seeing if it keeps up with the traffic better.
> >>
> >> --
> >> John Baldwin
> >>
> >
> > With net.isr.direct=0, my IBM system lessens CPU utilization per
> > interface (bce0 and bce1) but swi1:net increase its utilization.
> > Can you explained what's happening here? What does net.isr.direct do
> > with the decrease of CPU utilization on its interface? I really wanted
> > to know what happened internally during the packets being processed
> > and received by the interfaces then to the device interrupt up to the
> > software interrupt level because I am confused when enabling/disabling
> > net.isr.direct in sysctl. Is there a tool that can we used to trace
> > this process just to be able to know which part of the kernel internal
> > is doing the bottleneck especially when net.isr.direct=1? By the way
> > with device polling enabled, the system experienced packet errors and
> > the interface throughput is worst, so I avoid using it though.
> >
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >
> >   16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle: 
cpu10
> >   27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
> >   52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32: 
bce1
> >   15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle: 
cpu11
> >   25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: 
cpu1
> >   51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31: 
bce0
> >
> >
> > Regards,
> > Archimedes
> >
> 
> One more thing, I observed that when net.isr.direct=1, bce0 is using
> irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now
> using irq31 and bce1 is using irq32. What makes it different?

That is not from net.isr.direcct.  irq256/257 is when the bce devices are 
using MSI.  irq31/32 is when the bce devices are using INTx.

-- 
John Baldwin

From owner-freebsd-smp@FreeBSD.ORG  Wed Nov 19 11:42:02 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC6F0106564A;
	Wed, 19 Nov 2008 11:42:02 +0000 (UTC)
	(envelope-from takawata@init-main.com)
Received: from sana.init-main.com (unknown [IPv6:2001:240:28::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 761798FC17;
	Wed, 19 Nov 2008 11:42:02 +0000 (UTC)
	(envelope-from takawata@init-main.com)
Received: from init-main.com (localhost [127.0.0.1])
	by sana.init-main.com (8.14.3/8.14.3) with ESMTP id mAJBi3Lg004559;
	Wed, 19 Nov 2008 20:44:03 +0900 (JST)
	(envelope-from takawata@init-main.com)
Message-Id: <200811191144.mAJBi3Lg004559@sana.init-main.com>
To: freebsd-current@freebsd.org, freebsd-hackers@freebsd.org,
	freebsd-smp@freebsd.org
Date: Wed, 19 Nov 2008 20:44:03 +0900
From: Takanori Watanabe <takawata@init-main.com>
Cc: 
Subject: Core i7 anyone else?
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Nov 2008 11:42:03 -0000

Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500)
and sometimes hangs up oddly.
When in the state, some specific process only works and 
replys ping, but not reply any useful information.

I suspect it may caused by CPU power management, so I cut 
almost all CPU power management feature on BIOS parameter.

Are there any people encouterd such trouble?
And on this machine build world in SCHED_ULE(15min.) is slower 
than SCHED_4BSD(12min.).


===dmesg===
http://www.init-main.com/corei7.dmesg
or
http://pastebin.com/m187f77aa
(if host is down)

=====DSDT====
http://www.init-main.com/corei7.asl
or
http://pastebin.com/m6879984a

==some sysctls==
hw.machine: i386
hw.model: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
hw.ncpu: 8
hw.byteorder: 1234
hw.physmem: 3202322432
hw.usermem: 2956083200
hw.pagesize: 4096
hw.floatingpoint: 1
hw.machine_arch: i386
hw.realmem: 3211264000
==
machdep.enable_panic_key: 0
machdep.adjkerntz: -32400
machdep.wall_cmos_clock: 1
machdep.disable_rtc_set: 0
machdep.disable_mtrrs: 0
machdep.guessed_bootdev: 2686451712
machdep.idle: acpi
machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi, 
machdep.hlt_cpus: 0
machdep.prot_fault_translation: 0
machdep.panic_on_nmi: 1
machdep.kdb_on_nmi: 1
machdep.tsc_freq: 2684011396
machdep.i8254_freq: 1193182
machdep.acpi_timer_freq: 3579545
machdep.acpi_root: 1024240
machdep.hlt_logical_cpus: 0
machdep.logical_cpus_mask: 254
machdep.hyperthreading_allowed: 1
==
kern.sched.preemption: 0
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="0">
  <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu>
  <flags></flags>
 </group>
</groups>

kern.sched.steal_thresh: 3
kern.sched.steal_idle: 1
kern.sched.steal_htt: 1
kern.sched.balance_interval: 133
kern.sched.balance: 1
kern.sched.affinity: 1
kern.sched.idlespinthresh: 4
kern.sched.idlespins: 10000
kern.sched.static_boost: 160
kern.sched.preempt_thresh: 0
kern.sched.interact: 30
kern.sched.slice: 13
kern.sched.name: ULE
===

From owner-freebsd-smp@FreeBSD.ORG  Wed Nov 19 11:57:57 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9C9411065672
	for <freebsd-smp@freebsd.org>; Wed, 19 Nov 2008 11:57:57 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from QMTA03.westchester.pa.mail.comcast.net
	(qmta03.westchester.pa.mail.comcast.net [76.96.62.32])
	by mx1.freebsd.org (Postfix) with ESMTP id 422608FC1E
	for <freebsd-smp@freebsd.org>; Wed, 19 Nov 2008 11:57:56 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from OMTA08.westchester.pa.mail.comcast.net ([76.96.62.12])
	by QMTA03.westchester.pa.mail.comcast.net with comcast
	id gzjh1a00N0Fqzac53zmKKN; Wed, 19 Nov 2008 11:46:19 +0000
Received: from koitsu.dyndns.org ([69.181.141.110])
	by OMTA08.westchester.pa.mail.comcast.net with comcast
	id gznE1a00Q2P6wsM3UznFzA; Wed, 19 Nov 2008 11:47:16 +0000
X-Authority-Analysis: v=1.0 c=1 a=DiZ76wm4AAAA:8 a=fGO4tVQLAAAA:8
	a=6I5d2MoRAAAA:8 a=QycZ5dHgAAAA:8 a=8oBBsJxBPgm8k0VuyQAA:9
	a=CmO0HorRqYcm4XbtWS8pjeXFcQ8A:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10
Received: by icarus.home.lan (Postfix, from userid 1000)
	id B30E933C36; Wed, 19 Nov 2008 03:47:14 -0800 (PST)
Date: Wed, 19 Nov 2008 03:47:14 -0800
From: Jeremy Chadwick <koitsu@FreeBSD.org>
To: Takanori Watanabe <takawata@init-main.com>
Message-ID: <20081119114714.GA85533@icarus.home.lan>
References: <200811191144.mAJBi3Lg004559@sana.init-main.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200811191144.mAJBi3Lg004559@sana.init-main.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Cc: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org,
	freebsd-smp@freebsd.org
Subject: Re: Core i7 anyone else?
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Nov 2008 11:57:57 -0000

On Wed, Nov 19, 2008 at 08:44:03PM +0900, Takanori Watanabe wrote:
> Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500)
> and sometimes hangs up oddly.
> When in the state, some specific process only works and 
> replys ping, but not reply any useful information.
> 
> I suspect it may caused by CPU power management, so I cut 
> almost all CPU power management feature on BIOS parameter.
> 
> Are there any people encouterd such trouble?
> And on this machine build world in SCHED_ULE(15min.) is slower 
> than SCHED_4BSD(12min.).
> 
> 
> ===dmesg===
> http://www.init-main.com/corei7.dmesg
> or
> http://pastebin.com/m187f77aa
> (if host is down)
> 
> =====DSDT====
> http://www.init-main.com/corei7.asl
> or
> http://pastebin.com/m6879984a
> 
> ==some sysctls==
> hw.machine: i386
> hw.model: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
> hw.ncpu: 8
> hw.byteorder: 1234
> hw.physmem: 3202322432
> hw.usermem: 2956083200
> hw.pagesize: 4096
> hw.floatingpoint: 1
> hw.machine_arch: i386
> hw.realmem: 3211264000
> ==
> machdep.enable_panic_key: 0
> machdep.adjkerntz: -32400
> machdep.wall_cmos_clock: 1
> machdep.disable_rtc_set: 0
> machdep.disable_mtrrs: 0
> machdep.guessed_bootdev: 2686451712
> machdep.idle: acpi
> machdep.idle_available: spin, mwait, mwait_hlt, hlt, acpi, 
> machdep.hlt_cpus: 0
> machdep.prot_fault_translation: 0
> machdep.panic_on_nmi: 1
> machdep.kdb_on_nmi: 1
> machdep.tsc_freq: 2684011396
> machdep.i8254_freq: 1193182
> machdep.acpi_timer_freq: 3579545
> machdep.acpi_root: 1024240
> machdep.hlt_logical_cpus: 0
> machdep.logical_cpus_mask: 254
> machdep.hyperthreading_allowed: 1
> ==
> kern.sched.preemption: 0
> kern.sched.topology_spec: <groups>
>  <group level="1" cache-level="0">
>   <cpu count="8" mask="0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu>
>   <flags></flags>
>  </group>
> </groups>
> 
> kern.sched.steal_thresh: 3
> kern.sched.steal_idle: 1
> kern.sched.steal_htt: 1
> kern.sched.balance_interval: 133
> kern.sched.balance: 1
> kern.sched.affinity: 1
> kern.sched.idlespinthresh: 4
> kern.sched.idlespins: 10000
> kern.sched.static_boost: 160
> kern.sched.preempt_thresh: 0
> kern.sched.interact: 30
> kern.sched.slice: 13
> kern.sched.name: ULE
> ===

When building world/kernel, do you see odd behaviour (on CURRENT) such
as the load average being absurdly high, or processes (anything; sh,
make, mutt, etc.) getting stuck in bizarre states?  These things are
what caused my buildworld/buildkernel times to increase (compared to
RELENG_7).  I was using ULE entirely (on CURRENT and RELENG_7), but
did not try 4BSD.  I documented my experience.

http://wiki.freebsd.org/JeremyChadwick/Bizarre_CURRENT_experience

I have no idea if your problem is the same as mine.  This is purely
speculative on my part.  (And readers of that Wiki article should
note that the problem was not hardware-related)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |


From owner-freebsd-smp@FreeBSD.ORG  Wed Nov 19 12:05:11 2008
Return-Path: <owner-freebsd-smp@FreeBSD.ORG>
Delivered-To: freebsd-smp@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 83ECE106567C
	for <freebsd-smp@freebsd.org>; Wed, 19 Nov 2008 12:05:11 +0000 (UTC)
	(envelope-from freebsd-smp@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 043238FC18
	for <freebsd-smp@freebsd.org>; Wed, 19 Nov 2008 12:05:10 +0000 (UTC)
	(envelope-from freebsd-smp@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1L2lnr-0005ay-4g
	for freebsd-smp@freebsd.org; Wed, 19 Nov 2008 12:05:04 +0000
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-smp@freebsd.org>; Wed, 19 Nov 2008 12:05:03 +0000
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-smp@freebsd.org>; Wed, 19 Nov 2008 12:05:03 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-smp@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Wed, 19 Nov 2008 12:58:54 +0100
Lines: 92
Message-ID: <4923FF7E.1080101@freebsd.org>
References: <200811191144.mAJBi3Lg004559@sana.init-main.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigCFC1809BFB1D0993DBF70FC6"
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 2.0.0.17 (X11/20080925)
In-Reply-To: <200811191144.mAJBi3Lg004559@sana.init-main.com>
X-Enigmail-Version: 0.95.0
Sender: news <news@ger.gmane.org>
Cc: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org,
	freebsd-smp@freebsd.org
Subject: Re: Core i7 anyone else?
X-BeenThere: freebsd-smp@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: FreeBSD SMP implementation group <freebsd-smp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-smp>
List-Post: <mailto:freebsd-smp@freebsd.org>
List-Help: <mailto:freebsd-smp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-smp>,
	<mailto:freebsd-smp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Nov 2008 12:05:11 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigCFC1809BFB1D0993DBF70FC6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Takanori Watanabe wrote:
> Hi, I recently bought Core i7 machine(for 145,000JPY: about $1500)
> and sometimes hangs up oddly.
> When in the state, some specific process only works and=20
> replys ping, but not reply any useful information.
>=20
> I suspect it may caused by CPU power management, so I cut=20
> almost all CPU power management feature on BIOS parameter.
>=20
> Are there any people encouterd such trouble?
> And on this machine build world in SCHED_ULE(15min.) is slower=20
> than SCHED_4BSD(12min.).


I don't know but this:

> =3D=3D=3Ddmesg=3D=3D=3D
> http://www.init-main.com/corei7.dmesg
> or
> http://pastebin.com/m187f77aa
> (if host is down)

CPU: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz (2684.00-MHz
686-class CPU)
  Origin =3D "GenuineIntel"  Id =3D 0x106a4  Stepping =3D 4

Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG=
E,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

Features2=3D0x98e3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,P=
DCM,SSE4.1,SSE4.2,POPCNT>
  AMD Features=3D0x28100000<NX,RDTSCP,LM>
  AMD Features2=3D0x1<LAHF>
  Cores per package: 8
  Logical CPUs per core: 2
real memory  =3D 3211264000 (3062 MB)
avail memory =3D 3143983104 (2998 MB)
ACPI APIC Table: <7522MS A7522100>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7

is a bit in conflict with this:

> kern.sched.topology_spec: <groups>
>  <group level=3D"1" cache-level=3D"0">
>   <cpu count=3D"8" mask=3D"0xff">0, 1, 2, 3, 4, 5, 6, 7</cpu>
>   <flags></flags>
>  </group>
> </groups>

=46rom what I know of its architecture i7 has hyperthreading - i.e. the
CPU has 4 "real" cores which are hyperthreaded, so you get 8 cores
total. It probably also includes a different way of enumerating its
topology which might have caused wrong topology detection and your
slowdown in buildworld. (the CPU also has L3 cache, but I think it's not
looked up in topology detection).

I don't know it this particular error could be responsible for your
lockups - probably not. The CPU also introduces some big changes in
power management (dynamic powerdown of individual cores) which could
cause them - but I can't help you there.

Are you sure it's not something trivial like overheating?


--------------enigCFC1809BFB1D0993DBF70FC6
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJI/9+ldnAQVacBcgRAptBAKCvy5iMZkVJ7f/v/8jWVRvs0Oa1vwCgnlPY
fl3ySAZXU5NXl0ZmOXf43t4=
=hTDW
-----END PGP SIGNATURE-----

--------------enigCFC1809BFB1D0993DBF70FC6--