Date: Sun, 31 Jul 2016 19:35:27 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Mateusz Guzik <mjg@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r303583 - head/sys/amd64/amd64 Message-ID: <20160731163527.GZ83214@kib.kiev.ua> In-Reply-To: <20160731220407.Q3033@besplex.bde.org> References: <201607311134.u6VBY81j031059@repo.freebsd.org> <20160731220407.Q3033@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > On Haswell, "rep stos" takes about 25 cycles to start up, and the function > call overhead is in the noise. 25 cycles is a lot. Haswell can move > 32 bytes/cycle from L2 to L2, so it misses moving 800 bytes or 1/5 of a > page in its startup overhead. Oops, that is for "rep movs". "rep stos" > is similar. > The commit message contained a probable explanation of the reason why the change demonstrated measurable improvement in non-microbenchmark load. That said, the only thing I am answering and asking there is the above claim about 25 cycles overhead of rep;stosq on hsw. I am curious how the overhead was measured. Note: Agner Fog' tables state that fast mode takes <2n uops and has reciprocal throughput of 0.5n worst case and do not demostrate any setup overhead for hsw.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160731163527.GZ83214>