Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Dec 1995 02:15:58 +0100
From:      Torbjorn Granlund <tege@matematik.su.se>
To:        freebsd-hackers@freebsd.org
Subject:   Pentium bcopy
Message-ID:  <199512240116.CAA26645@insanus.matematik.su.se>

next in thread | raw e-mail | index | archive | help
I sent you patches to improve the support.s bcopy a few months ago.  I have
not heard anything back (sic).  Maybe I should just give up, and use some
other operating system, where bug reports and contributions from external
people are considered?  Well, I won't give up just yet!  ;-)

Now, that is a diplomatic way of starting a message...

This time I want to help improving the bcopy/memcpy/memmove functions for
the Pentium (and 486).  Here is a skeleton bcopy/memcpy that runs about 5
times faster than your current implementation on a Pentium.  This bcopy
handles up to about 350 MB/s on a Pentium 133, compared to the current 70
MB/s.

The reason that this is so much faster is that it uses the dual-ported cache
is a near-optimal way.  Your code seems to rely on rep+movsl, which is much
slower.

Well, I haven't bothered to integrate this into your infrastructure since
that might be a waste of my time, if you just keep ignoring my messages.  If
you are interested in this optimization, I volunteer to do the rest of the
work.

Note that bzero can be sped up in the same way.  I have a feeling that
bcopy/bzero are used now and then by the VM system...

/* Pentium bcopy */
	.text
	.align 4
	.globl	_copy
_copy:	pushl	%edi
	pushl	%esi

	movl	12(%esp),%edi	/* destination pointer */
	movl	16(%esp),%esi	/* source pointer */
	movl	20(%esp),%ecx	/* size (in 32-bit words) */

	shrl	$3,%ecx		/* count for unrolled loop */
	jz	Lend		/* if zero, skip unrolled loop */

	movl	(%edi),%eax	/* Fetch destination cache line */

	.align	2,0x90		/* supply 0x90 for broken assemblers */
Loop:	movl	28(%edi),%eax	/* allocate cache line for destination */
	nop			/* we want these two insn to pair! */

	movl	(%esi),%eax	/* read words pairwise */
	movl	4(%esi),%edx
	movl	%eax,(%edi)	/* store words pairwise */
	movl	%edx,4(%edi)

	movl	8(%esi),%eax
	movl	12(%esi),%edx
	movl	%eax,8(%edi)
	movl	%edx,12(%edi)

	movl	16(%esi),%eax
	movl	20(%esi),%edx
	movl	%eax,16(%edi)
	movl	%edx,20(%edi)

	movl	24(%esi),%eax
	movl	28(%esi),%edx
	movl	%eax,24(%edi)
	movl	%edx,28(%edi)

	addl	$32,%esi	/* update source pointer */
	addl	$32,%edi	/* update destnation pointer */
	decl	%ecx		/* decr loop count */
	jnz	Loop

/* Copy last 0-7 words */
Lend:	movl	20(%esp),%ecx
	andl	$7,%ecx
	cld
	rep
	movsl

	popl	%esi
	popl	%edi
	ret



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512240116.CAA26645>