From owner-freebsd-current@FreeBSD.ORG Thu Sep 9 06:00:49 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 106FB16A4CF; Thu, 9 Sep 2004 06:00:49 +0000 (GMT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id B7E0543D58; Thu, 9 Sep 2004 06:00:20 +0000 (GMT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.13.1/8.13.1) with ESMTP id i8960FRP001150; Thu, 9 Sep 2004 02:00:15 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i8960ERA001147; Thu, 9 Sep 2004 02:00:15 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 9 Sep 2004 02:00:14 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Matthew Dillon In-Reply-To: <200409090445.i894jRei071606@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: scottl@freebsd.org cc: Gerrit Nagelhout cc: current@freebsd.org Subject: Re: FreeBSD 5.3 Bridge performance take II X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Sep 2004 06:00:49 -0000 On Wed, 8 Sep 2004, Matthew Dillon wrote: > I would recommend against per-thread caches. Instead, make the per-cpu > caches actually *be* per-cpu (that is, not require a mutex). This is One of the paragraphs you appear not to have quoted from my e-mail was this one: % One nice thing about using this experimental code is that I hope it will % allow us to reason more effectively about the extent to which improving % per-cpu data structures improves efficiency -- I can now much more % easily say "OK, what happens if eliminate the cost of locking for common % place mbuf allocation/free". I've also started looking at per-interface % caches based on the same model, which has some similar limitations (but % also some similar benefits), such as stuffing per-interface uma caches % in struct ifnet. I.e., using per-thread UMA caches is a 30-60 minute hack that allows me to explore and measure the performance benefits (and costs) of several different approaches, including per-cpu, per-thread, and per-data structure/object caching without doing the full implementation up front. Per-thread caching, for example, can simulate the effects of non-preemption and mutex avoidance in micro-benchmarking, although in the general case under macro-benchmark perspective it suffers from a number of problems (including the draining, balancing, and extra storage cost issues). I didn't attempt to address these problems under the assumption that the current implementation is a tool for exploring performance, not something to actually use. In doing so, my hope was to identify which areas will offer the most immediate performance benefits, be it simply cutting down on costly operations (such as the entropy harvesting code for Yarrow which appears to have found its way into our interrupt path), rethinking locking strategies, optimizing out/coalescing locking, optimizing out excess memory allocation, optimizing synchronization primitives with the same semantics, changing synchronization assumptions to offer weaker/stronger semantics, etc. Right now, though, the greatest obstacle in my immediate path appears to be a bug in the current version of the if_em driver that causes the interfaces on my test box to wedge under even moderate load. The if_em cards I have on other machines seem not to do this, which suggests a driver weirdness with this particular version of the chipset/card. Go figure... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research