From owner-freebsd-hackers@FreeBSD.ORG Sat Jun 4 02:29:06 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1292D16A41C for ; Sat, 4 Jun 2005 02:29:06 +0000 (GMT) (envelope-from ups@tree.com) Received: from smtp.speedfactory.net (talon.speedfactory.net [66.23.216.215]) by mx1.FreeBSD.org (Postfix) with ESMTP id 91B6043D1F for ; Sat, 4 Jun 2005 02:29:05 +0000 (GMT) (envelope-from ups@tree.com) Received: (qmail 25039 invoked by uid 210); 4 Jun 2005 03:24:41 +0000 Received: from 66.23.216.49 by talon (envelope-from , uid 201) with qmail-scanner-1.25st (clamdscan: 0.85.1/910. spamassassin: 3.0.2. perlscan: 1.25st. Clear:RC:1(66.23.216.49):. Processed in 0.408775 secs); 04 Jun 2005 03:24:41 -0000 X-Qmail-Scanner-Mail-From: ups@tree.com via talon X-Qmail-Scanner: 1.25st (Clear:RC:1(66.23.216.49):. Processed in 0.408775 secs Process 25027) Received: from 66-23-216-49.clients.speedfactory.net (HELO palm.tree.com) (66.23.216.49) by smtp.speedfactory.net with AES256-SHA encrypted SMTP; 4 Jun 2005 03:24:40 +0000 Received: from [127.0.0.1] (ups@localhost.tree.com [127.0.0.1]) by palm.tree.com (8.12.10/8.12.10) with ESMTP id j542T2pP057053; Fri, 3 Jun 2005 22:29:02 -0400 (EDT) (envelope-from ups@tree.com) From: Stephan Uphoff To: Matthew Dillon In-Reply-To: <200506032247.j53MlQBg062506@apollo.backplane.com> References: <200506032057.j53KvOFw062012@apollo.backplane.com> <1117835598.27369.12036.camel@palm> <200506032247.j53MlQBg062506@apollo.backplane.com> Content-Type: text/plain Message-Id: <1117852142.27369.13418.camel@palm> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Date: Fri, 03 Jun 2005 22:29:02 -0400 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: Possible instruction pipelining problem between HT's on the same die ? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jun 2005 02:29:06 -0000 On Fri, 2005-06-03 at 18:47, Matthew Dillon wrote: > :This is normal behaviour. > :Take a look at IA-32 Intel Developers ... Vol 3, > :Section: 7.2.2 for details + solutions. > : > :Stephan > > Ok.. that section seems to indicate that speculative reads > can pass writes, but it also says that the pipeline sniffs the address > within the processor and ensures proper ordering. The latter part > makes sense within the context of a single cpu, but the big question is: > Is that supposed to hold true for interactions with HT cpus (that share > the pipeline) as well? Or not ? It seems not. Memory ordering in logical HT CPUs is the same as in real CPUs (see 7.6.1.9) > > Speculative reads creating out of order situations seems to be the > biggest issue. The AMD manual (Programmers manual volume 3 page > 186, MFENCE instruction) says this: > > "The MFENCE instruction is weakly-ordered with respect to data and > instruction prefetches. Speculative loads initiated by the processor, > or specified explicitly using cache-prefetch instructions, can be > reordered around an MFENCE". Speculative loads can pass MFENCE - but can not pass load operations issued before MFENCE. > This seems to be different then what the Intel manual says, and doesn't > make much sense. What's the point of having a fence instruction if it > can't guarentee read/write ordering? Is the AMD manual simply wrong ? Not wrong - just confusing. READ A MFENCE READ B can cause READ A Speculative READ B MFENCE but NOT Speculative READ B READ A MFENCE > Other then that, the Intel manual does indicate that speculative reads > will not pass locked bus cycle instructions (the AMD manual says nothing > about that that I can see). AMD Volume 1 - 3.9.2 > So, presumably, doing a dummy locked bus > cycle operation on e.g. the top of the stack, such as Linux does, would > be sufficient to ensure read ordering. Would you concur with that > assessment? Yes > What's really horrible here is that the 'old' value of the data being > used is modified at location A something like 30 instructions prior to > the instruction that updates the index (B). I think this is a > situation that can only occur in an HT configuration, and then only if > the speculative read issued by the HT cpu is being held for across > 30 instructions executed by the primary cpu before the HT cpu issues the > read of B. > > cpu #0 cpu #1 (HT cpu on same die as cpu #0) > > speculatively read A > write A (stalled) > [30 instructions] (stalled x 30) > write B (stalled) > read B > see that B has been updated > read A (get old value for A instead of new) > > Is that even possible ? Not only the 30 instruction latency, but also > the fact that even with the shared pipeline you have a speculative read > on the HT cpu surviving 30 instructions running on cpu #0 (but only one > or two on the HT cpu)... even though they share the same pipeline. Take a look at store buffers. Reads have a higher priority than writes on some CPUs and data may be even stored indefinitely long in a store buffer. ( Where it can not be observed by other CPUs) Reading some of the Intel and AMD errata gives you a good picture. Stephan