Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Jul 1999 20:13:31 +1000
From:      Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>
To:        dillon@apollo.backplane.com
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re: "objtrm" problem probably found (was Re: Stuck in "objtrm")
Message-ID:  <99Jul13.195541est.40353@border.alcanet.com.au>
In-Reply-To: <199907130501.WAA74171@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon <dillon@apollo.backplane.com> wrote:
>:I'm not sure there's any reason why you shouldn't.  If you changed the
>:semantics of a stack segment so that memory addresses below the stack
>:pointer were irrelevant, you could implement a small, 0-cycle, on-chip
>:stack (that overflowed into memory).
>
>    This would be relatively complex and also results in cache coherency
>    problems.

I agree that there would be additional complexity.  I believe that the
`on-chip stack cache' part has been implemented on some Forth chips
(where stack performance is rather critical), though I don't know
whether any of them were MP-capable.

My reason for suggesting the change to stack semantics was also to
allow cache line allocation without a memory fetch (ie if SP=1000,
a push would result in ff0..fff (or fe0..fff) being allocated as
a cache line without bothering to fetch ff0..ffb).  I'm not sure
whether this change would actually provide a measurable improvement
though (I suspect that it wouldn't).

In this case, I believe cache coherency can be bypassed.  The stack
segment is only needed on one processor at a time.  If there's an
interrupt on that CPU, the on-chip stack would flush to memory so
that the memory image was consistent.

At the minimal end, another way of looking at it would be as an
`invisible' branch-and-link register - capable of saving a single
return address as long as nothing else was pushed onto the stack.

> A solution already exists:  It's called branch-and-link,
One case where the IBM/360 accidently got it right :-).

>    but Intel cpu's do not use it because Intel cpu's do not have enough
>    registers (makes you just want to throw up -- all that MMX junk and they
>    couldn't add a branch and link register! ).
But all that MMX junk makes Doom (or whatever) look much better
and that's far more critical :-).

>  The key with branch-and-link
>    is that the lowest subroutine level does not have to save/restore the 
>    register, making entry and return two or three times faster then 
>    subroutine calls that make other subroutine calls.
I seem to recall reading somewhere that leaf subroutine performance
is also fairly important for overall performance (though that may
have been before C-compilers learnt how to in-line functions).

Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?99Jul13.195541est.40353>