Date: Mon, 22 Oct 2001 02:23:34 -0700 From: Peter Wemm <peter@wemm.org> To: Doug Rabson <dfr@nlsystems.com> Cc: Marcel Moolenaar <marcel@xcllnt.net>, ia64@FreeBSD.ORG Subject: Re: Hazards [was: Re: cvs commit: src/sys/ia64/ia64 sal.c] Message-ID: <20011022092334.1F3F138CC@overcee.netplex.com.au> In-Reply-To: <20011022094201.L549-100000@salmon.nlsystems.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Doug Rabson wrote: > On Sun, 21 Oct 2001, Marcel Moolenaar wrote: > > > On Sun, Oct 21, 2001 at 02:34:35PM -0700, Peter Wemm wrote: > > > > > > 52: 3: tbit.nz p6,p0=in0,0 ;; > > > 53: (p6) st1 [in0]=r0,1 > > > 54: (p6) add in1=-1,in1 > > > 55: > > > 56: tbit.nz p6,p0=in0,1 ;; > > > 57: (p6) st2 [in0]=r0,2 > > > 58: (p6) add in1=-2,in1 > > > 59: > > > 60: tbit.nz p6,p0=in0,2 ;; > > > 61: (p6) st4 [in0]=r0,4 > > > 62: (p6) add in1=-4,in1 > > > 63: > > > 64: ;; > > > > [snip] > > > > > but that hardly seems efficient. could we copy in0 to somewhere else in > > > order to avoid the RAW? the bits we're interested in are not going to ch ange > > > by the st1/2/4 adds. > > > > The code is inherently sequential in that the result of the > > postinc is used by subsequent tbit instructions. One way to > > increase ILP is to do an aligned ld8, zero-out the bytes > > that need to be zeroed in the temporary register and write > > the result back. in0 (ptr) and in1 (size) can be updated > > without there being an immediate use for them. The code > > will be endianness sensitive though. Something like: > > > > and t0 = 0xf8, in0;; // sign-extension > > ld8 t1 = [t0];; > > // Zero-out the bytes in t1 that need zeroed > > st8 [t0] = t1 > > > > in0 can be updated by a simple add: > > > > add in0 = 8, t0 > > > > in1 can be updated by the following sequence: > > > > or t2 = 7, in0 > > mov t3 = in1 ;; > > sub in1 = t3, t2 > > > > Both updates can be performed concurrently with the zeroing > > of t1. The zeroing of t1 can be sequence of predicated dep > > instructions. > > > > Just a thought, > > I'm not too worried about performance here - this is just cleaning up the > pointer so that we can do an aligned store in the main loop. I'm just > going to add the stops as Peter suggested. We can revisit this (and all > the other string code) and work on performance later. The whole lot > probably needs rewriting. Perhaps Intel has some sample code... Heh. There was another one in vfork(). This might have been what broke csh (a heavily dependent on vfork()). I've modified my ia64-final gcc specs file to fix the -D__linux__ etc crap and also put in -x into the asm flags so that gcc/gas will warn about dependency violations. The relevant lines: *asm_final: -x %| *link: %{shared:-shared} %{!shared: %{!static: %{rdynamic:-export-dynamic} %{!dynamic-linker:-dynamic-linker /usr/libexec/ld-elf.so.1}} %{static:-static}} *predefines: -D__ia64 -D__ia64__ -D__FreeBSD__=5 -D_LONGLONG -Dunix -D__LP64__ -D__ELF__ -Asystem(FreeBSD) -Acpu(ia64) -Amachine(ia64) There's bunches of other stuff that is out of sync. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-ia64" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011022092334.1F3F138CC>