Skip site navigation (1)Skip section navigation (2)
Date:      Thu,  9 Nov 95 14:03 IST
From:      koshy@blr.novell.com
To:        hasty@rah.star-gate.com ()
Cc:        freebsd-hackers@freebsd.org
Subject:   Load/Store using FPU regs ...
Message-ID:  <30a1bccf0.4265@novidc.blr.novell.com>
In-Reply-To: <199511071056.CAA02766@rah.star-gate.com> (hasty@rah.star-gate.com)

next in thread | previous in thread | raw e-mail | index | archive | help
>>>>> "Amancio" == "Amancio Hasty Jr " <hasty@rah.star-gate.com> writes:

    >>> L20: fldl (%ebx) fstpl (%ecx) ...
    >>> 
    >>> The resulting program copies data at about 60 Megabytes per
    >>> second.


Using the FPU registers for memmove/bitblt operations was a technique
I first saw on an i860.  We used to do a series of reads into FPU regs
followed by a series of writes.  This benefited us because the memory
subsystem had an 11 clock latency for the first read, but could
deliver successive quadwords every 3 or so clocks.  Latency for the
first write was less than that for a read but was still significant.
Thus 16 reads followed by 16 writes ran faster than 16 reads
alternated with 16 writes.

Now, I'm not sure if this approach can be used across all processors.
Some FPU's could raise exceptions if illegal bit-patterns are loaded
into its registers.  The x86 FPU in particular has very few registers
and a LIFO access pattern for loads and stores so I don't know if the
same trick would work well for it.

Koshy




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?30a1bccf0.4265>