From owner-freebsd-current Sat Apr 6 09:54:52 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id JAA09343 for current-outgoing; Sat, 6 Apr 1996 09:54:52 -0800 (PST) Received: from insanus.matematik.su.se (insanus.matematik.su.se [130.237.198.12]) by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id JAA09336 for ; Sat, 6 Apr 1996 09:54:49 -0800 (PST) Received: from localhost (prudens.matematik.su.se [130.237.198.5]) by insanus.matematik.su.se (8.7.5/8.6.9) with ESMTP id TAA17355; Sat, 6 Apr 1996 19:54:30 +0200 (MET DST) Message-Id: <199604061754.TAA17355@insanus.matematik.su.se> X-Address: Department of Mathematics, Stockholm University S-106 91 Stockholm SWEDEN X-Phone: int+46 8 162000 X-Fax: int+46 8 6126717 X-Url: http://www.matematik.su.se To: Bruce Evans cc: asami@cs.berkeley.edu, current@freebsd.org, hasty@rah.star-gate.com, mrami@minerva.cis.yale.edu, nisha@cs.berkeley.edu, tege@matematik.su.se Subject: Re: optimized bzeros found harmful (was: fast memory copy ...) In-reply-to: Your message of "Sat, 06 Apr 1996 09:13:46 +1000." <199604052313.JAA28956@godzilla.zeta.org.au> Date: Sat, 06 Apr 1996 19:54:25 +0200 From: Torbjorn Granlund Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk This behaviour is consistent with the data being zeroed usually not being in the L2 cache. RBW is 33% slower in that case on my system. Other cases: if the data is in the L2 cache but not in the L1 cache, then RBW is between 0% and 33% faster; if data the data is in the L1 cache, then RBW is 8.5 times faster (740MB/s!). This must be a misunderstanding! If the data is really in the L1 cache, the read-before-write is wasted and just contributes to the overhead. The read-before-write is effective if and only if the data is not in the L1 cache. In that case, it forces allocation of the cache line in the L1 cache, and thereby allows a 14x peak speedup. If other behaviours are observed, the timing framework confuses you. All other CPUs I know of have caches that do allocate-on-write.