From owner-freebsd-stable@FreeBSD.ORG Mon May 9 15:00:05 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B202916A4E8 for ; Mon, 9 May 2005 15:00:05 +0000 (GMT) Received: from ylpvm15.prodigy.net (ylpvm15-ext.prodigy.net [207.115.57.46]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4F60343D66 for ; Mon, 9 May 2005 15:00:05 +0000 (GMT) (envelope-from ewan@mathcode.net) Received: from pimout2-ext.prodigy.net (pimout2-ext.prodigy.net [207.115.63.101])j49EuGDE025310 for ; Mon, 9 May 2005 10:56:17 -0400 X-ORBL: [68.23.218.132] Received: from mail.mathcode.net (adsl-68-23-218-132.dsl.wotnoh.ameritech.net [68.23.218.132])j49F03md047478 for ; Mon, 9 May 2005 11:00:04 -0400 Received: from bsd.mathcode.net (localhost [127.0.0.1]) by mail.mathcode.net (8.12.11/8.12.11) with ESMTP id j49F0Igr009628 for ; Mon, 9 May 2005 11:00:18 -0400 (EDT) (envelope-from ewan@bsd.mathcode.net) Received: (from ewan@localhost) by bsd.mathcode.net (8.12.11/8.12.11/Submit) id j49F0I9w009627 for freebsd-stable@freebsd.org; Mon, 9 May 2005 11:00:18 -0400 (EDT) (envelope-from ewan) Date: Mon, 9 May 2005 11:00:18 -0400 From: Ewan Todd To: freebsd-stable Message-ID: <20050509150018.GF281@mathcode.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i X-Spam-Status: No, hits=0.0 required=5.0 tests=none autolearn=no version=2.64 X-Spam-Checker-Version: SpamAssassin 2.64 (2004-01-11) on bsd.mathcode.net Subject: Performance issue X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2005 15:00:05 -0000 Hi All, I have what I think is a serious performance issue with fbsd 5.3 release. I've read about threading issues, and it seems to me that that is what I'm looking at, but I'm not confident enough to rule out that it might be a hardware issue, a kernel configuration issue, or something to do with the python port. I'd appreciate it if someone would it point out if I'm overlooking something obvious. Otherwise, if it is the problem I think it is, then there seems entirely too little acknowledgement of a major issue. Here's the background. I just got a new (to me) AMD machine and put 5.3 release on it. I'd been very happy with the way my old Intel machine had been performing with 4.10 stable, and I decided to run a simple performance diagnostic on both machines, to wow myself with the amazing performance of the new hardware / kernel combination. However, the result was pretty disappointing. Here are what I think are the pertinent dmesg details. Old rig: FreeBSD 4.10-RELEASE #0: Thu Jul 1 22:47:08 EDT 2004 Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 449235058 Hz CPU: Pentium III/Pentium III Xeon/Celeron (449.24-MHz 686-class CPU) New rig: FreeBSD 5.3-RELEASE #0: Fri Nov 5 04:19:18 UTC 2004 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) Processor (995.77-MHz 686-class CPU) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Timecounter "TSC" frequency 995767383 Hz quality 800 Timecounters tick every 10.000 msec The diagnostic I selected was a python program to generate 1 million pseudo-random numbers and then to perform a heap sort on them. That code is included at the foot of this email. I named the file "heapsort.py". I ran it on both machines, using the "time" utility in /usr/bin/ (not the builtin tcsh "time"). So the command line was /usr/bin/time -al -o heapsort.data ./heapsort.py 1000000 A typical result for the old rig was 130.78 real 129.86 user 0.11 sys 22344 maximum resident set size 608 average shared memory size 20528 average unshared data size 128 average unshared stack size 5360 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 0 voluntary context switches 2386 involuntary context switches Whereas, the typical result for the new rig looked more like 105.36 real 71.10 user 33.41 sys 23376 maximum resident set size 659 average shared memory size 20796 average unshared data size 127 average unshared stack size 5402 page reclaims 0 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 0 signals received 0 voluntary context switches 10548 involuntary context switches You'll notice that the new rig is indeed a little faster (times in seconds): 105.36 real (new rig) compared with 130.78 real (old rig). However, the new rig spends about 33.41 seconds on system overhead compared with just 0.11 seconds on the old rig. Comparing the rusage stats, the only significant difference is the "involuntary context switches" field, where the old rig has 2386 and the new rig has a whopping 10548. Further, I noticed that the number of context switches on the new rig seems to be more or less exactly one per 10 msec of real time, that is, one per timecounter tick. (I saw this when comparing heapsort.py runs with arguments other than 1000000.) I think the new rig ought to execute this task in about 70 seconds: just over the amount of user time. Assuming that I'm not overlooking something obvious, and that I'm not interpreting a feature as a bug, this business with the context switches strikes me as a bit of a show-stopper. If that's right, it appears to be severely underplayed in the release documentation. I'll be happy if someone would kindly explain to me what's going on here. I'll be even happier to hear of a fix or workaround to remedy the situation. Thanks in advance, -e heapsort.py: #!/usr/local/bin/python -O # $Id: heapsort-python-3.code,v 1.3 2005/04/04 14:56:45 bfulgham Exp $ # # The Great Computer Language Shootout # http://shootout.alioth.debian.org/ # # Updated by Valentino Volonghi for Python 2.4 # Reworked by Kevin Carson to produce correct results and same intent import sys IM = 139968 IA = 3877 IC = 29573 LAST = 42 def gen_random(max) : global LAST LAST = (LAST * IA + IC) % IM return( (max * LAST) / IM ) def heapsort(n, ra) : ir = n l = (n >> 1) + 1 while True : if l > 1 : l -= 1 rra = ra[l] else : rra = ra[ir] ra[ir] = ra[1] ir -= 1 if ir == 1 : ra[1] = rra return i = l j = l << 1 while j <= ir : if (j < ir) and (ra[j] < ra[j + 1]) : j += 1 if rra < ra[j] : ra[i] = ra[j] i = j j += j else : j = ir + 1; ra[i] = rra; def main() : if len(sys.argv) == 2 : N = int(sys.argv[1]) else : N = 1 ary = [None]*(N + 1) for i in xrange(1, N + 1) : ary[i] = gen_random(1.0) heapsort(N, ary) print "%.10f" % ary[N] main()