From owner-freebsd-arm@FreeBSD.ORG Mon Mar 8 14:29:36 2010 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1EC5106566B for ; Mon, 8 Mar 2010 14:29:36 +0000 (UTC) (envelope-from maksverver@geocities.com) Received: from mx.utwente.nl (mx3.utsp.utwente.nl [130.89.2.14]) by mx1.freebsd.org (Postfix) with ESMTP id 2AE8F8FC0A for ; Mon, 8 Mar 2010 14:29:35 +0000 (UTC) Received: from heaven.student.utwente.nl (heaven.student.utwente.nl [130.89.167.52]) by mx.utwente.nl (8.12.10/SuSE Linux 0.7) with ESMTP id o28ETRHH017262 for ; Mon, 8 Mar 2010 15:29:27 +0100 Message-ID: <4B9509C5.7050804@geocities.com> Date: Mon, 08 Mar 2010 15:29:25 +0100 From: Maks Verver User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.8) Gecko/20100308 Thunderbird/3.0.3 MIME-Version: 1.0 To: freebsd-arm@freebsd.org References: <201003072125.o27LPfFb000968@casselton.net> In-Reply-To: <201003072125.o27LPfFb000968@casselton.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-UTwente-MailScanner-Information: Scanned by MailScanner. Contact icts.servicedesk@utwente.nl for more information. X-UTwente-MailScanner: Found to be clean X-UTwente-MailScanner-SpamScore: s X-UTwente-MailScanner-From: maksverver@geocities.com X-Spam-Status: No Subject: Re: Performance of SheevaPlug on 8-stable X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Mar 2010 14:29:36 -0000 On 03/07/2010 10:25 PM, Mark Tinguely wrote: > FreeBSD-current has kernel and user witness turned on. Witness is > for locks, so it should not change the performance of a tight > arithmetic loop like this. For the record, I've been using 8-stable so far. > I don't know the marvell interals, and from what I tell, their > technial docs require NDA. Yeah, that sucks. But I don't think the SheevaPlug contains a lot of novel technology; it's just a slightly different configuration. In any case, Linux seem to have more or less complete support for the SheevaPlug (including L2 cache, SDIO and NAND flash) so for details, the GPL'ed Linux source code may be helpful. > It looks like from the cpu identification that the the branch > prediction is turned on. Branch prediction compensates for the longer > pipelines. I can't see how in the tight loop how that could go > astray. Well, since the Linux version of the test program runs exactly as well as I expect (or could ever hope for) I don't have any doubts that the CPU is able to run the tight loop efficiently. The question (for me) is why it doesn't run just as well on FreeBSD. I tried a couple of the suggestions: Mark Tingely wrote: > Thinking way out of the box ... has anyone tried this in single user > mode? I did, and it still takes 287 seconds (same as before). Petter Selasky wrote: > Was the output from "vmstat -i" and "top" posted? Note yet. vmstat -i reports: interrupt total rate irq1: timer0 130981 999 irq33: uart0 477 3 irq19: ehci0 875 6 Total 132333 1010 Which looks entirely reasonable to me. Top contains the same info as the time data I posted: 99.x% of CPU time is spent in user-mode, lots of free memory. So it seems the kernel has very little do with this. Next up, this patch: > http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff No idea what this does, but it helps a lot: %time ./test 9.000u 0.000s 0:09.11 99.2% 40+1324k 0+0io 0pf+0w That's much better than the 280+ seconds from before. But it's still nearly twice as long as Linux takes. There is more weirdness though. If I freshly boot the system I get timings like these, and even nbench reports decent scores. However, if I do a couple things like rerun/recompile nbench, then at some point something 'breaks' and the performance goes back down to what it used to be. So Mark's patch definitely touches on something related to the problem, but doesn't quite solve the problem completely. I still have no clue what's going on, but I'm willing to try out suggestions if anyone has them. :-) - Maks Verver.