Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 May 1996 17:07:22 -0700
From:      asami@cs.berkeley.edu (Satoshi Asami)
To:        bde@zeta.org.au
Cc:        bde@zeta.org.au, culler@cs.berkeley.edu, current@freebsd.org, ken@area238.residence.gatech.edu, marc@bowtie.nl, nisha@cs.berkeley.edu, pattrsn@cs.berkeley.edu, wollman@lcs.mit.edu, wscott@ichips.intel.com
Subject:   Re: more on fast bcopy
Message-ID:  <199605080007.RAA10390@sunrise.cs.berkeley.edu>
In-Reply-To: <199605072341.JAA11831@godzilla.zeta.org.au> (message from Bruce Evans on Wed, 8 May 1996 09:41:48 %2B1000)

next in thread | previous in thread | raw e-mail | index | archive | help
 * >Yeah, we were running into problems with this.  Can you tell us how to
 * >do it? ;)
 * 
 * Something like:
 * 
 * 	subl	$108,%esp
 * 	movl	%cr0,%edx
 * 	pushl	%edx		# if used
 * 	clts
        ^^^^
Oops, didn't know about that one. ;)

 * 	fnsave	(%esp)
 * 	...
 * 
 * 	frstor	(%esp)
 * 	popl	%edx		# if used
 * 	movl	%edx,%cr0
 * 	addl	$108,%esp
 * 
 * The stack may need to be larger.
 * 
 * The complications involving IRQ13 don't apply since this method is too slow
 * to use on systems with external coprocessors.
 * 
 * The commented out code in fpunrolled.s doesn't preserve CR0_TS.

Really?  This is what I had:

       movl %cr0,%edx
       movl $8, %eax   /* CR0_TS */
       not %eax
       andl %eax,%edx  /* clear CR0_TS */
       movl %edx,%cr0
        :
       andl $8,%edx
       movl %cr0,%eax
       orl %edx, %eax  /* reset CR0_TS to the original value */
       movl %eax,%cr0

The original value of %cr0 is saved in %edx, and the CR0_TS bit is
extracted and then or'ed back into %cr0 at the end.

I did it this way because I didn't know if any of the other bits in
%cr0 would change inside the loop.

By the way, the problems we were seeing were random file corruptions,
and I thought it was because FP regs aren't saved as part of the
context switch (and although we are saving/restoring them upon entry
and leaving our function, something else would come along and mess it
up).  Will it explain this?

 * I don't think more unrolling is good.  It will bust the I-cache and it
 * should be possible to schedule the loop control instructions to take
 * essentially zero time compared with the D-cache-missing memory access
 * instructions.

Hmm....

Satoshi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605080007.RAA10390>