From owner-freebsd-hackers@FreeBSD.ORG  Mon Feb  6 07:04:37 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 88C1B106564A;
	Mon,  6 Feb 2012 07:04:37 +0000 (UTC)
	(envelope-from mavbsd@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id E24628FC18;
	Mon,  6 Feb 2012 07:04:36 +0000 (UTC)
Received: by eekb47 with SMTP id b47so2443995eek.13
	for <multiple recipients>; Sun, 05 Feb 2012 23:04:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=sender:message-id:date:from:user-agent:mime-version:to:subject
	:content-type:content-transfer-encoding;
	bh=SC4GdLGrv3QFmxwdGo3U5/7IxlNvMjqlzX93WPwOXWE=;
	b=n9SE1eBBxDbrwiORaoKOCbXWUeX/Nz+3cRdCPAlDYSwir2jKfo++DJdEz7PgfFJSaD
	qLgBTyG94TpefrYdjmYJg2OaqCebdIRkURMbqPwaJiqdRoe7f2SvZnb4FrZSnA+Urn9m
	6MyyV9gATQa7+6z5KoFpSMRk32vBQLBIEQ5oc=
Received: by 10.14.48.8 with SMTP id u8mr5383188eeb.37.1328511874339;
	Sun, 05 Feb 2012 23:04:34 -0800 (PST)
Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua. [212.86.226.226])
	by mx.google.com with ESMTPS id n17sm57847046eei.3.2012.02.05.23.04.32
	(version=SSLv3 cipher=OTHER); Sun, 05 Feb 2012 23:04:33 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <4F2F7B7F.40508@FreeBSD.org>
Date: Mon, 06 Feb 2012 09:04:31 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:9.0) Gecko/20111227 Thunderbird/9.0
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Subject: [RFT][patch] Scheduling for HTT and not only
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Feb 2012 07:04:37 -0000

Hi.

I've analyzed scheduler behavior and think found the problem with HTT. 
SCHED_ULE knows about HTT and when doing load balancing once a second, 
it does right things. Unluckily, if some other thread gets in the way, 
process can be easily pushed out to another CPU, where it will stay for 
another second because of CPU affinity, possibly sharing physical core 
with something else without need.

I've made a patch, reworking SCHED_ULE affinity code, to fix that:
http://people.freebsd.org/~mav/sched.htt.patch

This patch does three things:
  - Disables strict affinity optimization when HTT detected to let more 
sophisticated code to take into account load of other logical core(s).
  - Adds affinity support to the sched_lowest() function to prefer 
specified (last used) CPU (and CPU groups it belongs to) in case of 
equal load. Previous code always selected first valid CPU of evens. It 
caused threads migration to lower CPUs without need.
  - If current CPU group has no CPU where the process with its priority 
can run now, sequentially check parent CPU groups before doing global 
search. That should improve affinity for the next cache levels.

I've made several different benchmarks to test it, and so far results 
look promising:
  - On Atom D525 (2 physical cores + HTT) I've tested HTTP receive with 
fetch and FTP transmit with ftpd. On receive I've got 103MB/s on 
interface; on transmit somewhat less -- about 85MB/s. In both cases 
scheduler kept interrupt thread and application on different physical 
cores. Without patch speed fluctuating about 103-80MB/s on receive and 
is about 85MB/s on transmit.
  - On the same Atom I've tested TCP speed with iperf and got mostly the 
same results:
    - receive to Atom with patch -- 755-765Mbit/s, without patch -- 
531-765Mbit/s.
    - transmit from Atom in both cases 679Mbit/s.
Fluctuating receive behavior in both tests I think can be explained by 
some heavy callout handled by the swi4:clock process, called on receive 
(seen in top and schedgraph), but not on transmit. May be it is 
specifics of the Realtek NIC driver.

  - On the same Atom tested number of 512 byte reads from SSD with dd in 
1 and 32 streams. Found no regressions, but no benefits also as with one 
stream there is no congestion and with multiple streams all cores congested.

  - On Core i7-2600K (4 physical cores + HTT) I've run more then 20 
`make buildworld`s with different -j values (1,2,4,6,8,12,16) for both 
original and patched kernel. I've found no performance regressions, 
while for -j4 I've got 10% improvement:
# ministat -w 65 res4A res4B
x res4A
+ res4B
+-----------------------------------------------------------------+
|+                                                                |
|++                                          x    x              x|
|A|                                        |______M__A__________| |
+-----------------------------------------------------------------+
     N        Min        Max      Median           Avg        Stddev
x   3    1554.86    1617.43     1571.62     1581.3033     32.389449
+   3    1420.69     1423.1     1421.36     1421.7167     1.2439587
Difference at 95.0% confidence
         -159.587 ą 51.9496
         -10.0921% ą 3.28524%
         (Student's t, pooled s = 22.9197)
, and for -j6 -- 3.6% improvement:
# ministat -w 65 res6A res6B
x res6A
+ res6B
+-----------------------------------------------------------------+
|  +                                                              |
|  +       +                             x                 x x    |
||_M__A___|                                |__________A____M_____||
+-----------------------------------------------------------------+
     N        Min        Max     Median           Avg        Stddev
x   3    1381.17    1402.94     1400.3     1394.8033     11.880372
+   3     1340.4    1349.34    1341.23     1343.6567     4.9393758
Difference at 95.0% confidence
         -51.1467 ą 20.6211
         -3.66694% ą 1.47842%
         (Student's t, pooled s = 9.09782)

Who wants to do independent testing to verify my results or do some more 
interesting benchmarks? :)

PS: Sponsored by iXsystems, Inc.

-- 
Alexander Motin