Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Oct 2012 13:06:20 +0000
From:      Attilio Rao <attilio@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Florian Smeets <flo@freebsd.org>, Bruce Evans <bde@freebsd.org>, svn-src-projects@freebsd.org
Subject:   Re: svn commit: r238907 - projects/calloutng/sys/kern
Message-ID:  <CAJ-FndAyPVB8VS%2BzNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g@mail.gmail.com>
In-Reply-To: <20121029155136.O943@besplex.bde.org>
References:  <201207301350.q6UDobCI099069@svn.freebsd.org> <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com> <CAJ-FndDnO7wjnWPV0tTu%2BUGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com> <201207301732.33474.jhb@freebsd.org> <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com> <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com> <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=%2BfSpHTA@mail.gmail.com> <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com> <20121029155136.O943@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote:
> On Mon, 29 Oct 2012, Attilio Rao wrote:
>
>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>> this new patch, making critical_enter()/critical_exit() inline:
>> http://www.freebsd.org/~attilio/inline_critical.patch
>>
>> The concept is pretty simple: simple add/dec for critical_enter, exit
>> are inlined, the rest is in an "hard path". Debugging enables the hard
>> paths by default (really I think that only KTR may be due here, but I
>> thought that in case of INVARIANTS this was also wanted, so I added
>> the check also for that case).
>>
>> flo@ has stress-tested the patch already and he may be starting
>> serious benchmarks soon.
>> The effectiveness of this patch is to determine and it brings again
>> the question: better an hard function or the usage of compiler membars
>> to avoid issues? I think that this patch should be in only if we can
>> absolutely prove a measurable performance bump, otherwise I will just
>> add a note to critical_enter()/critical_exit() explaining why they
>> should not be inlined at all.
>
> Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined.
> Similarly for mtx_unlock_spin() and critical_exit().  It saves 1 function
> call. but critical_enter() does a function call anyway.  critical_exit*(
> also has a branch in branch in it that might cost more than the function
> call just for mispredicting it.

Correct, that is a further argument for having inlined
critical_enter(), even if the actual calling cames from
spinlock_enter(), not critical_enter(), which must be MD (that's on
FreeBSD, not sure what happens in your OS).

> My version goes the other way and uninlines mtx_lock_spin() and
> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
> critical_exit() in them.  This saves a lot of text space and thus
> hopefully saves time too.  I couldn't find any cases where it either
> saved or lost a significant amount of time.  The main cause of time
> differences is probably mispredicted branches: with mtx*() inlined,
> the branches in it can be predicted separately, so they are more
> predictable.  However, when they are separate, there may be enough to
> thrash the branch predictor.  It is probably best to only inline mtx*()
> for a few heavily-used locks and only inline critical*() if it is either
> in one of these or is in a non-inline function.  Otherwise, inlining
> mainly gives code bloat that is only harmless if the code is rarely
> actually used.  But this is hard to configure.  <sys/mutex.h> is
> already a mess to support uninlining for debugging cases.
>
> OTOH, I couldn't get uninlining of mtx_lock() and mtx_unlock() to work.
> These are a little smaller than the spinlock versions, mainly since they
> don't inline critical*().  The loss from unlining them cannot be
> compensated for by uninlining critical*() in them, and I found just 1
> workload where uninlining them was slower: udp packets per second bandwidth
> tests.

I don't think that uninling mtx_lock()/unlock() (btw, on which hw are
you testing them if you are still able to catch performance penalties
by branch misprediction?!) is a good idea, likely what would be a
better one is to both inline critical_enter() and spinlock_enter().

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAyPVB8VS%2BzNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g>