From owner-freebsd-current@FreeBSD.ORG  Thu Sep  9 04:45:28 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id AD9B516A4CE; Thu,  9 Sep 2004 04:45:28 +0000 (GMT)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 6F86243D1D; Thu,  9 Sep 2004 04:45:28 +0000 (GMT)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	i894jSla071607;	Wed, 8 Sep 2004 21:45:28 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id i894jRei071606;
	Wed, 8 Sep 2004 21:45:27 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 8 Sep 2004 21:45:27 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200409090445.i894jRei071606@apollo.backplane.com>
To: Robert Watson <rwatson@freebsd.org>
References: <Pine.NEB.3.96L.1040908181306.61174F-100000@fledge.watson.org>
cc: Gerrit Nagelhout <gnagelhout@sandvine.com>
cc: current@freebsd.org
cc: slong@freebsd.org
Subject: Re: FreeBSD 5.3 Bridge performance take II
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Sep 2004 04:45:28 -0000

:In the rwatson_umaperthread branch, what I've done is started to associate
:struct uma_cache structures with threads.  Since caches are "per-zone", I
:allow threads to register for zones of interest; these caches are hung off
:of struct thread, and must be explicitly registered and released.  While
:..
:
:In practice, this eliminates mutex acquisition for mbuf allocation and
:free in the forwarding and bridging paths, and halves the number of
:operations when interacting with user threads (as they don't have the
:..
: 
:My interest in looking at per-thread caches was to explore ways in which
:to reduce the cost of zone allocation without making modifications to our
:synchronization model.  It has been proposed that a better way to achieve
:...
:
:Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
:robert@fledge.watson.org      Principal Research Scientist, McAfee Research

    I would recommend against per-thread caches.  Instead, make the per-cpu
    caches actually *be* per-cpu (that is, not require a mutex).  This is
    what I do in DragonFly's Slab allocator.  For the life of me I just don't
    understand why one would spend so much effort creating a per-cpu caching
    subsystem and then slap a mutex right smack in the middle of the
    critical allocation and deallocation paths.  Non critical operations,
    such as high level zone management, can be done passively (in DragonFly's
    case through IPI messaging which, when I get to it, can be queued
    passively rather then actively), or by a helper thread which migrates
    to the cpu whos cache it needs to operate on, does its stuff, then
    migrates to the next cpu, or by any number of other clever mechanisms
    none of which require a brute-force mutex to access the data.

    I use this cpu migration trick for a number of things in DragonFly.
    Jeff and I use it for wildcard pcb registration (which is replicated
    across cpus).  The thread list sysctl code collects per-cpu thread
    data by iterating through the cpus (migrating the thread to each cpu
    to collect the data and then ending up on the cpu it began on before
    returning to user mode).  Basically, any non-critical-path operation
    can use this trick in order to allow the real critical path -- the
    actual packet traffic, to operate without mutexes.

    So, instead of adding more hacks, please just *fix* the slab allocator
    in FreeBSD-5.  You will find that suddenly a lot of things you were
    contemplating writing additional subsystems for will then suddenly work
    (and work very efficiently) by just calling the slab allocator directly.

    The problem with per-thread caching is that you greatly increase the
    amount of waste in the system.  If you have 50 threads each with their
    own per-thread cache and a hysteresis of, say, 32 allocations, you
    wind up with 50*32 = 1600 allocations worth of potential waste.  With
    a per-cpu case the slop is a lot more deterministic (since the number of
    cpus is a fixed, known quantity).  Another problem with per-thread 
    caching is that it greatly reduces performance in certain common
    allocation cases... in particular the case where data is allocated by
    one subsystems (say, an interrupt thread), and freed by another subsystem
    (say, a protocol thread or other consumer).  This sort of problem is 
    a lot easier to fix with a per-cpu cache organization and a lot harder
    to fix with a per-thread cache organization.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>