Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Dec 2006 19:50:55 +0100
From:      "Attilio Rao" <attilio@freebsd.org>
To:        "ranjith kumar" <ranjith_kumar_b4u@yahoo.com>
Cc:        freebsd-ia32@freebsd.org
Subject:   Re: prefetching on pentium4
Message-ID:  <3bbf2fe10612061050y6fa458abw3b1ace0cd1bebd37@mail.gmail.com>
In-Reply-To: <20061206042834.59293.qmail@web58611.mail.re3.yahoo.com>
References:  <3bbf2fe10611160753q3303d81bw515bffe9af4ee0c9@mail.gmail.com> <20061206042834.59293.qmail@web58611.mail.re3.yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2006/12/6, ranjith kumar <ranjith_kumar_b4u@yahoo.com>:
> Hi,
>     There are 4 types of prefetch instructions on
> pentium 4 (IA-32) processor.
> prefetchnta,prefetcht0,prefetcht1,prefetcht2.
>
> In case of pentium 4, IA-32 otimization manuvals say
> that prefetcht0,prefetcht1,prefetcht2 are identical.
>
> It also says ONLY prefetchnta instruction prefetches
> data into L2 cache without poluting caches.
>
>  When all the four instructions prefetches data into
> L2 cache (not into L1 cache) , what is the meaning in
> saying prefetchnta does not polute caches?
>
> ie)what is the difference between prefetchnta and
> other instructions?

First of all, it is important to say that prefetch* instruction is
only an hint for the CPU and not a *command* for that, so the CPU
needs to evaluate (in a not precisated way) if accept or not the
caching request.
>From this point of view, prefetch* instruction might be the more
accomodant possible for the CPU.
Different numbers mean different 'critical' level for the CPU (0 -
high critical, 2 - low critical), which means prefetching the cache
line to an higher level into the cache hierarchy.
This would means, in an hypotetical way:

prefetch0 -> L1 prefetching
prefetch1 -> L2 prefetching
prefetch2 -> L3 prefetching

And this is what really happens, for example, on P3 (if you consider
P3 has not L3 cache, prefetch2 == prefetch1).
On P4 things are different beacause you would not manipulate directly
L1 cache and, so, what happens is:

prefetch0 -> L2 prefetching
prefetch1 -> L2 prefetching
prefetch2 -> L3 prefetching
(if L3 cache is not present prefetch2 is the same as the other, from
this the assumption all the three instructions behave at the same).

prefetchnta is completely different beacause it fetches a cache line
into the NT cache structure.
Non Temporal caches are global caches which are particulary powerful
beacause they don't need of snooping messages between CPUs (and, in
this way, they reduce the CPUs<->caches traffic) and are used by NTI
family.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10612061050y6fa458abw3b1ace0cd1bebd37>