From owner-freebsd-arm@FreeBSD.ORG  Mon Mar  8 14:29:36 2010
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D1EC5106566B
	for <freebsd-arm@freebsd.org>; Mon,  8 Mar 2010 14:29:36 +0000 (UTC)
	(envelope-from maksverver@geocities.com)
Received: from mx.utwente.nl (mx3.utsp.utwente.nl [130.89.2.14])
	by mx1.freebsd.org (Postfix) with ESMTP id 2AE8F8FC0A
	for <freebsd-arm@freebsd.org>; Mon,  8 Mar 2010 14:29:35 +0000 (UTC)
Received: from heaven.student.utwente.nl (heaven.student.utwente.nl
	[130.89.167.52])
	by mx.utwente.nl (8.12.10/SuSE Linux 0.7) with ESMTP id o28ETRHH017262
	for <freebsd-arm@freebsd.org>; Mon, 8 Mar 2010 15:29:27 +0100
Message-ID: <4B9509C5.7050804@geocities.com>
Date: Mon, 08 Mar 2010 15:29:25 +0100
From: Maks Verver <maksverver@geocities.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.1.8) Gecko/20100308 Thunderbird/3.0.3
MIME-Version: 1.0
To: freebsd-arm@freebsd.org
References: <201003072125.o27LPfFb000968@casselton.net>
In-Reply-To: <201003072125.o27LPfFb000968@casselton.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-UTwente-MailScanner-Information: Scanned by MailScanner. Contact
	icts.servicedesk@utwente.nl for more information.
X-UTwente-MailScanner: Found to be clean
X-UTwente-MailScanner-SpamScore: s
X-UTwente-MailScanner-From: maksverver@geocities.com
X-Spam-Status: No
Subject: Re: Performance of SheevaPlug on 8-stable
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
	<mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
	<mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Mar 2010 14:29:36 -0000

On 03/07/2010 10:25 PM, Mark Tinguely wrote:
> FreeBSD-current has kernel and user witness turned on. Witness is
> for locks, so it should not change the performance of a tight
> arithmetic loop like this.

For the record, I've been using 8-stable so far.

> I don't know the marvell interals, and from what I tell, their
> technial docs require NDA.

Yeah, that sucks. But I don't think the SheevaPlug contains a lot of
novel technology; it's just a slightly different configuration. In any
case, Linux seem to have more or less complete support for the
SheevaPlug (including L2 cache, SDIO and NAND flash) so for
details, the GPL'ed Linux source code may be helpful.

> It looks like from the cpu identification that the the branch
> prediction is turned on. Branch prediction compensates for the longer
> pipelines. I can't see how in the tight loop how that could go
> astray.

Well, since the Linux version of the test program runs exactly as well
as I expect (or could ever hope for) I don't have any doubts that the
CPU is able to run the tight loop efficiently. The question (for me) is
why it doesn't run just as well on FreeBSD.

I tried a couple of the suggestions:

Mark Tingely wrote:
> Thinking way out of the box ... has anyone tried this in single user
> mode?

I did, and it still takes 287 seconds (same as before).

Petter Selasky wrote:
> Was the output from "vmstat -i" and "top" posted?

Note yet. vmstat -i reports:

  interrupt                   total       rate
  irq1: timer0               130981        999
  irq33: uart0                  477          3
  irq19: ehci0                  875          6
  Total                      132333       1010

Which looks entirely reasonable to me. Top contains the same info as the
time data I posted: 99.x% of CPU time is spent in user-mode, lots of
free memory. So it seems the kernel has very little do with this.

Next up, this patch:

> http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff

No idea what this does, but it helps a lot:

  %time ./test
  9.000u 0.000s 0:09.11 99.2%	40+1324k 0+0io 0pf+0w

That's much better than the 280+ seconds from before. But it's still
nearly twice as long as Linux takes.

There is more weirdness though. If I freshly boot the system I get
timings like these, and even nbench reports decent scores. However, if I
do a couple things like rerun/recompile nbench, then at some point
something 'breaks' and the performance goes back down to what it used to be.

So Mark's patch definitely touches on something related to the problem,
but doesn't quite solve the problem completely. I still have no clue
what's going on, but I'm willing to try out suggestions if anyone has
them. :-)

 - Maks Verver.