Date: Tue, 4 Dec 2007 16:16:23 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Karsten Behrmann <BearPerson@gmx.net> Cc: freebsd-arch@freebsd.org Subject: Re: Code review request: small optimization to localtime.c Message-ID: <20071204153851.C3662@delplex.bde.org> In-Reply-To: <20071203235929.685d3674@Karsten.Behrmanns.Kasten> References: <20071128.151021.709401576.imp@bsdimp.com> <20071203235929.685d3674@Karsten.Behrmanns.Kasten>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Dec 2007, Karsten Behrmann wrote: > Ah, the old "double-checked locking" thing. Take what I say here > with a grain of salt: I'm a plain programmer, not very versed in > different architectures, so I just pick up what I hear. > > essentially, the problem with the scheme is that plain variable > writes may not be ordered as expected, making them unsuitable to > communicate between threads and processors, depending on compiler > and architecture. > > Let's take the example > > pthread_mutex_lock(); > if (!is_set) { > a = 3; > b = 4; > c = a * b; > is_set = true; > } > pthread_mutex_unlock(); > > Now, an optimizing compiler might put a and b into registers, > compute a * b while storing a 1 to is_set, and then store a, b, and > c from registers into memory. As I said, I'm not actually sure > where compilers are allowed to do this. > > Also, on some architectures, the caching structure may cause writes > to is_set to show up earlier on another CPU than writes to other > parts of memory, if you're unlucky. As alfred said, the locking implementation is supposed to hide these problems so that most code doesn't need to worry about them. I don't know what pthread does, but in for kernel mutexes on i386 with CC=gcc, the first problem is handled by putting a "memory" clobber in the asm for atomic_cmpset to force the compiler to load and store all memory variables (not just the ones in args for the asm), while the second problem is handled implicitly by using a locked instruction for cmpset and/or other magic i386 cache behaviour. The atomic cmpset is, more precisely, atomic_cmpset_acq for locking and atomic_cmpset_rel for unlocking. It is acquire/release semantics that handles the second problem. On i386's acquire/release semantics happen almost automatically so at the lowest level there is only atomic_cmpset_int and everything else is #defined as that. Some sloppy (buggy?) callers of the atomic_ cmpset family use only atomic_cmpset_int, and one of the examples in atomic.9 doesn't mention acq or rel. It isn't clear what a non- acquiring non-releasing cmpset is good for. > In fact, in your patch, you sometimes actually set the variable > getting checked before setting the data it is supposed to protect ;-) > >> - _MUTEX_LOCK(&gmt_mutex); >> if (!gmt_is_set) { >> - gmt_is_set = TRUE; >> + _MUTEX_LOCK(&gmt_mutex); >> + if (!gmt_is_set) { >> + gmt_is_set = TRUE; > > XXX - at this point, gmt_is_set is TRUE but gmtptr is not initialized yet. This is a bug. It shows how tricky locking becomes again if you don't use the normal mechanism of always acquiring the lock first. With the normal mechanism, the ordering inside locked regions doesn't matter since variables are only accessed inside of locked regions, but the optimized version has accesses outside of locked regions. Setting gmt_is_set after initializing gmtptr isn't much better, since the unlock hasn't had a chance to force everything to memory so gmt_is_set might reach memory before gmtptr. I think the order needs to be something like: gmtptr = ... unlock lock gmt_is_set = TRUE; unlock, or slightly shorter using explicit barrier operations instead of implicit ones in unlock. Even if we could guarantee that gmtptr and gmt_is_set are not written prior to the unlock, there would still be a problem with the optimized version unless we could guarantee the order of the writes triggered by the unlock. The unoptimizated version is not affected by this complication either. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071204153851.C3662>