Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Oct 2001 02:23:34 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Doug Rabson <dfr@nlsystems.com>
Cc:        Marcel Moolenaar <marcel@xcllnt.net>, ia64@FreeBSD.ORG
Subject:   Re: Hazards [was: Re: cvs commit: src/sys/ia64/ia64 sal.c] 
Message-ID:  <20011022092334.1F3F138CC@overcee.netplex.com.au>
In-Reply-To: <20011022094201.L549-100000@salmon.nlsystems.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Doug Rabson wrote:
> On Sun, 21 Oct 2001, Marcel Moolenaar wrote:
> 
> > On Sun, Oct 21, 2001 at 02:34:35PM -0700, Peter Wemm wrote:
> > >
> > > 52: 3:      tbit.nz p6,p0=in0,0 ;;
> > > 53: (p6)    st1     [in0]=r0,1
> > > 54: (p6)    add     in1=-1,in1
> > > 55:
> > > 56:         tbit.nz p6,p0=in0,1 ;;
> > > 57: (p6)    st2     [in0]=r0,2
> > > 58: (p6)    add     in1=-2,in1
> > > 59:
> > > 60:         tbit.nz p6,p0=in0,2 ;;
> > > 61: (p6)    st4     [in0]=r0,4
> > > 62: (p6)    add     in1=-4,in1
> > > 63:
> > > 64:        ;;
> >
> > [snip]
> >
> > > but that hardly seems efficient.  could we copy in0 to somewhere else in
> > > order to avoid the RAW?  the bits we're interested in are not going to ch
    ange
> > > by the st1/2/4 adds.
> >
> > The code is inherently sequential in that the result of the
> > postinc is used by subsequent tbit instructions. One way to
> > increase ILP is to do an aligned ld8, zero-out the bytes
> > that need to be zeroed in the temporary register and write
> > the result back. in0 (ptr) and in1 (size) can be updated
> > without there being an immediate use for them. The code
> > will be endianness sensitive though. Something like:
> >
> > 	and	t0 = 0xf8, in0;;	// sign-extension
> > 	ld8	t1 = [t0];;
> > 	// Zero-out the bytes in t1 that need zeroed
> > 	st8	[t0] = t1
> >
> > in0 can be updated by a simple add:
> >
> > 	add	in0 = 8, t0
> >
> > in1 can be updated by the following sequence:
> >
> > 	or	t2 = 7, in0
> > 	mov	t3 = in1 ;;
> > 	sub	in1 = t3, t2
> >
> > Both updates can be performed concurrently with the zeroing
> > of t1. The zeroing of t1 can be sequence of predicated dep
> > instructions.
> >
> > Just a thought,
> 
> I'm not too worried about performance here - this is just cleaning up the
> pointer so that we can do an aligned store in the main loop. I'm just
> going to add the stops as Peter suggested. We can revisit this (and all
> the other string code) and work on performance later. The whole lot
> probably needs rewriting. Perhaps Intel has some sample code...

Heh.  There was another one in vfork().  This might have been what broke
csh (a heavily dependent on vfork()).

I've modified my ia64-final gcc specs file to fix the -D__linux__ etc crap
and also put in -x into the asm flags so that gcc/gas will warn about
dependency violations.  The relevant lines:

*asm_final:
-x %|

*link:
  %{shared:-shared}   %{!shared:     %{!static:       %{rdynamic:-export-dynamic}       %{!dynamic-linker:-dynamic-linker /usr/libexec/ld-elf.so.1}}       %{static:-static}}

*predefines:
-D__ia64 -D__ia64__ -D__FreeBSD__=5 -D_LONGLONG -Dunix -D__LP64__ -D__ELF__ -Asystem(FreeBSD) -Acpu(ia64) -Amachine(ia64)

There's bunches of other stuff that is out of sync.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-ia64" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011022092334.1F3F138CC>