Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Dec 2008 00:09:55 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        Scott Long <scottl@samsco.org>, src-committers@freebsd.org, Kip Macy <kmacy@freebsd.org>, John Baldwin <jhb@freebsd.org>, svn-src-all@freebsd.org, jfv@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r185162 - in head: . sys/amd64/include sys/arm/include sys/conf sys/dev/bce sys/dev/cxgb sys/dev/cxgb/sys sys/dev/cxgb/ulp/iw_cxgb sys/dev/mxge sys/dev/nxge sys/i386/include sys/i386/in...
Message-ID:  <20081202232222.N1514@besplex.bde.org>
In-Reply-To: <20081201214217.GS3045@deviant.kiev.zoral.com.ua>
References:  <200811220555.mAM5tuIJ007781@svn.freebsd.org> <3c1674c90811221651u338294frcdbd99b386733851@mail.gmail.com> <20081123154138.GS6408@deviant.kiev.zoral.com.ua> <200812011407.06563.jhb@freebsd.org> <20081201214217.GS3045@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 1 Dec 2008, Kostik Belousov wrote:

> On Mon, Dec 01, 2008 at 02:07:06PM -0500, John Baldwin wrote:
>> On Sunday 23 November 2008 10:41:38 am Kostik Belousov wrote:

>>> Below is the updated patch. It includes changes made after private comments
>>> by bde@ and uses symbolic definitions for the bits in the features words.
>>> I thought about accessing a per-CPU word for serialized instruction in the
>>> slow path, but decided that it does not beneficial.\
>>
>> Is the branch really better than just doing what the atomic operations for
>> mutexes, etc. do and just use 'lock addl $0,%esp' for a barrier in all cases
>> on i386 and only bother with using the fancier instructions on amd64?  Even
>> amd64 doesn't use *fence yet for the atomic ops actually.  I have had a patch
>> to use it for years, but during testing there was no discernable difference
>> between the existing 'lock addl' approach vs '*fence'.  I'd much rather just
>> use 486 code for all i386 machines than add a branch, esp. if
>> the "optimization" the branch is doing isn't an actual optimization.

Hmm.

> I do not insist. The branch is done only for arches that has no fence
> instructions. The section for the slow code is put after the main text,
> that should, according to Intel documentation, be predicted as not taken
> for the first execution.
> 
> For reference, below is the latest version of the patch. I postponed
> the commit, in particular, because it touches ixgb, and Jack did
> not responded.
> ...
> diff --git a/sys/i386/include/atomic.h b/sys/i386/include/atomic.h
> index f6bcf0c..dbdc945 100644
> --- a/sys/i386/include/atomic.h
> +++ b/sys/i386/include/atomic.h
> @@ -32,21 +32,47 @@
> #error this file needs sys/cdefs.h as a prerequisite
> #endif
>
> -
> -#if defined(I686_CPU)
> -#define mb()	__asm__ __volatile__ ("mfence;": : :"memory")
> -#define wmb()	__asm__ __volatile__ ("sfence;": : :"memory")
> -#define rmb()	__asm__ __volatile__ ("lfence;": : :"memory")
> -#else
> -/*
> - * do we need a serializing instruction?
> - */
> -#define mb()
> -#define wmb()
> -#define rmb()
> +#if defined(_KERNEL)

Still have this style bug (if defined() instead of ifdef).

> +#include <machine/cpufunc.h>

Namespace pollution.  This should be unnecessary because it is an error
to include <machine/atomic.h> (or <machine/cpufunc.h>) directly, at
least in the kernel.  The correct include is <sys/systm.h> which
supplies pollution by <machine/atomic.h> and <machine/cpufunc.h> and
lots more as standard.  There <machine/atomic.h> is included before
<machine/cpufunc.h>.  This order would now be wrong if you have added
a dependency of <machine/atomic.h> on <machine/cpufunc.h>, but I can't
see such a dependency...

> +#include <machine/specialreg.h>

... now I see it -- it is because <machine/specialreg.h> is broken and
has a dependency on <machine/cpufunc.h>.  <machine/specialreg.h> is
supposed to contain only #define's for special registers, but it has
2 inline functions that use functions from <machine/cpufunc.h>, without
even using #defines for the special registers that they use.  Normally
<machine/specialreg.h> can be included without worrying about this
dependency, thanks to the standard pollution, but now the standard
pollution is broken.

Namespace pollution.  This is almost necessary for CPUID_* (it would
be necessary if these macros were inlines, but with macros only
files that use the macros would need the include and there might be
few enough of them for this to be done for each, or <sys/systm.h>
could do it (ugh) and hthe order doesn't matter.

Another advantage of 'lock addl' is that CPUID_* are not needed.
Atomic ops would need the same treatment if they used *fence.

> +#define	mb()	__asm __volatile(			\
> +	"testl	%0,cpu_feature				\n\
> +	je	2f					\n\
> +	mfence						\n\
> +1:							\n\

Still have this style bug (no empty line before label that is not
fallen into).  I wasn't going to care about this, but then decided
that the non-fall-through here is especially non-obvious.

> +	.section .text.offpath				\n\
> +2:	lock						\n\

Still have this style bug (label on same line as instruction).

> +	addl	$0,cpu_feature				\n\

I wonder if adding to (%esp) is faster.  It is at least smaller.

> +	jmp	1b					\n\

Missing empty line as above.

> +	.text						\n\
> +"							\
> +	: : "i"(CPUID_SSE2) : "memory")

Missing space after "i" here and elsewhere.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081202232222.N1514>