Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Dec 2003 04:16:29 -0600 (CST)
From:      James Van Artsdalen <james-freebsd-amd64@jrv.org>
To:        freebsd-amd64@freebsd.org
Subject:   Re: libc assembly optimizations?
Message-ID:  <200312301016.hBUAGT4Q085640@bigtex.jrv.org>

next in thread | raw e-mail | index | archive | help
Here's an alternative for fabs (3):

ENTRY(fabs)
	psllq	$1,%xmm0	/* 64-bit shifts lefts */
	psrlq	$1,%xmm0	/* logical shift right clears sign */
	ret

/usr/src/lib/libc/amd64/gen/fabs.S does the code below.
gcc generates essentially the same code as below.
The shifts above seem to work and look better to me.

The string ops can made be significantly improved if allowed to
read extra bytes around the string but within the same 16-byte
paragraph as the start or end of the string.  This seems safe in
userland.

Finally, can the SSE2 regs be safely used in kernel mode?
Page fill and aligned-bulk bcopy calls can be improved this way.

/*
 * Ok, this sucks. Is there really no way to push an xmm register onto
 * the FP stack directly?
 */

ENTRY(fabs)
	movsd	%xmm0, -8(%rsp)
	fldl	-8(%rsp)
	fabs
	fstpl	-8(%rsp)
	movsd	-8(%rsp),%xmm0
	ret



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200312301016.hBUAGT4Q085640>