From owner-freebsd-stable@FreeBSD.ORG Mon May 9 15:24:33 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3DCFD16A4E9 for ; Mon, 9 May 2005 15:24:33 +0000 (GMT) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6980343D80 for ; Mon, 9 May 2005 15:24:30 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [192.168.254.21] (rat.samsco.home [192.168.254.21]) (authenticated bits=0) by pooker.samsco.org (8.13.3/8.13.3) with ESMTP id j49FTQrP028795; Mon, 9 May 2005 09:29:26 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <427F8076.7030105@samsco.org> Date: Mon, 09 May 2005 09:23:34 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.7.5) Gecko/20050321 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Ewan Todd References: <20050509150018.GF281@mathcode.net> In-Reply-To: <20050509150018.GF281@mathcode.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org cc: freebsd-stable Subject: Re: Performance issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2005 15:24:33 -0000 Ewan Todd wrote: > Hi All, > > I have what I think is a serious performance issue with fbsd 5.3 > release. I've read about threading issues, and it seems to me that > that is what I'm looking at, but I'm not confident enough to rule out > that it might be a hardware issue, a kernel configuration issue, or > something to do with the python port. I'd appreciate it if someone > would it point out if I'm overlooking something obvious. Otherwise, > if it is the problem I think it is, then there seems entirely too > little acknowledgement of a major issue. > > Here's the background. I just got a new (to me) AMD machine and put > 5.3 release on it. I'd been very happy with the way my old Intel > machine had been performing with 4.10 stable, and I decided to run a > simple performance diagnostic on both machines, to wow myself with the > amazing performance of the new hardware / kernel combination. > However, the result was pretty disappointing. > > Here are what I think are the pertinent dmesg details. > > Old rig: > > FreeBSD 4.10-RELEASE #0: Thu Jul 1 22:47:08 EDT 2004 > Timecounter "i8254" frequency 1193182 Hz > Timecounter "TSC" frequency 449235058 Hz > CPU: Pentium III/Pentium III Xeon/Celeron (449.24-MHz 686-class CPU) > > New rig: > > FreeBSD 5.3-RELEASE #0: Fri Nov 5 04:19:18 UTC 2004 > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: AMD Athlon(tm) Processor (995.77-MHz 686-class CPU) > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > Timecounter "TSC" frequency 995767383 Hz quality 800 > Timecounters tick every 10.000 msec > > The diagnostic I selected was a python program to generate 1 million > pseudo-random numbers and then to perform a heap sort on them. That > code is included at the foot of this email. I named the file > "heapsort.py". I ran it on both machines, using the "time" utility in > /usr/bin/ (not the builtin tcsh "time"). So the command line was > > /usr/bin/time -al -o heapsort.data ./heapsort.py 1000000 > > A typical result for the old rig was > > 130.78 real 129.86 user 0.11 sys > 22344 maximum resident set size > 608 average shared memory size > 20528 average unshared data size > 128 average unshared stack size > 5360 page reclaims > 0 page faults > 0 swaps > 0 block input operations > 0 block output operations > 0 messages sent > 0 messages received > 0 signals received > 0 voluntary context switches > 2386 involuntary context switches > > Whereas, the typical result for the new rig looked more like > > 105.36 real 71.10 user 33.41 sys > 23376 maximum resident set size > 659 average shared memory size > 20796 average unshared data size > 127 average unshared stack size > 5402 page reclaims > 0 page faults > 0 swaps > 0 block input operations > 0 block output operations > 0 messages sent > 0 messages received > 0 signals received > 0 voluntary context switches > 10548 involuntary context switches > > You'll notice that the new rig is indeed a little faster (times in > seconds): 105.36 real (new rig) compared with 130.78 real (old rig). > > However, the new rig spends about 33.41 seconds on system overhead > compared with just 0.11 seconds on the old rig. Comparing the rusage > stats, the only significant difference is the "involuntary context > switches" field, where the old rig has 2386 and the new rig has a > whopping 10548. Further, I noticed that the number of context > switches on the new rig seems to be more or less exactly one per 10 > msec of real time, that is, one per timecounter tick. (I saw this > when comparing heapsort.py runs with arguments other than 1000000.) > > I think the new rig ought to execute this task in about 70 seconds: > just over the amount of user time. Assuming that I'm not overlooking > something obvious, and that I'm not interpreting a feature as a bug, > this business with the context switches strikes me as a bit of a > show-stopper. If that's right, it appears to be severely underplayed > in the release documentation. > > I'll be happy if someone would kindly explain to me what's going on > here. I'll be even happier to hear of a fix or workaround to remedy > the situation. > > Thanks in advance, > > -e > > First of all, make sure that you have WITNESS and INVARIANTS off in your kernel. You might also want to recompile your kernel with the SMP option turned off. Scott