Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Dec 2011 19:36:21 -0000
From:      "Niall Douglas" <s_sourceforge@nedprod.com>
To:        arch@freebsd.org, threads@freebsd.org
Subject:   Re: [Patch] C1X threading support
Message-ID:  <4EEF9235.24779.B2519D55@s_sourceforge.nedprod.com>
In-Reply-To: <20111218115326.GD50300@deviant.kiev.zoral.com.ua>
References:  <85477.1324155737@critter.freebsd.dk>, <86ty4y4rj5.fsf@ds4.des.no>, <20111218115326.GD50300@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 18 Dec 2011 at 13:53, Kostik Belousov wrote:

> Well, the reverse was exactly _my_ point. I cannot find the description of
> how the abstract C machine behaves, in the presence of multiple threads
> of execution. The atomics chapter covers only some special operations,
> which are added in the new revision.

Indeed. It is deliberately unspecified.

> E.g., there is absolutely no mention of the memory changes visibility,
> or guarantees of atimicity of the assignments/reads etc. IMO, the threading
> was slapped nearby, and the standard is not useful as-is. I am sorry if
> I missed the parts.

The memory model has been very closely thought through by some of the 
leading domain experts in the area. What the C1X standard refers to 
is per-CPU core, so if you do an acquire, acquire, acquire, release, 
acquire then you get a certain guaranteed minimal ordering of reads 
and writes on that particular CPU core as seen from other cores. Note 
I used the condition "minimal". The whole point of acquire/release is 
that the maximum degree of freedom to reorder is given to both the 
compiler and the CPU.

Note that the CPU may ignore the ordering internally so long as 
externally the validity of the ordering is maintained. For example, 
if a cache line is exclusive to a core and no other core has it, the 
CPU may dispense with any ordering constraints within that cache 
line. This obviously has no bad consequences.

Right now Intel x86/x64 has an extremely tight memory model - all 
loads acquire and all stores release unless you use special SSE/AVX 
opcodes to say otherwise. This means that code which works great on 
Intel can randomly fail on other processors. It's worth bearing this 
in mind.

Niall

-- 
Technology & Consulting Services - ned Productions Limited.
http://www.nedproductions.biz/. VAT reg: IE 9708311Q. Company no: 
472909.






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EEF9235.24779.B2519D55>