From owner-svn-src-projects@FreeBSD.ORG  Mon Oct 29 15:43:50 2012
Return-Path: <owner-svn-src-projects@FreeBSD.ORG>
Delivered-To: svn-src-projects@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id EC6E3DF8;
 Mon, 29 Oct 2012 15:43:50 +0000 (UTC)
 (envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
 [209.85.217.182])
 by mx1.freebsd.org (Postfix) with ESMTP id 416298FC14;
 Mon, 29 Oct 2012 15:43:48 +0000 (UTC)
Received: by mail-lb0-f182.google.com with SMTP id b5so4033829lbd.13
 for <multiple recipients>; Mon, 29 Oct 2012 08:43:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=ZFkjh6w6r0sfb8dqCjdw0zOkNKki2gl1LG1DEM6Vp4w=;
 b=RIVu5svEkFceOUXIEBf7JVsKkiMuPEuzdaDncwxrCgr+6QuwVb/6eT5zJaovjOIfDt
 KYBocJ5rlpZHgNoGJaqHLDMSahSKdMUp7NX4A1PBCWawfFs0vu45uXAlKSN0AKshlao5
 UklRJaglR4PGy9M8Pck2vTHs4+xV0NnpSX9jt3qrDAgzn2nBV2f1XTWe/fpx/GK3N0t7
 Vcfhra28maSdqLA019iUfdku59rFHeZxFiOD04vxa5sG/g1qt+3OLAGjkERBy1+1bXwn
 zcP0vEYRdgXi24F5UQhb9b3JJRk5SpaBVbKTEN7hoeuIeUtyOOKYYRYoW4zIpGrodJIi
 jMLw==
MIME-Version: 1.0
Received: by 10.152.104.50 with SMTP id gb18mr28204569lab.9.1351525427578;
 Mon, 29 Oct 2012 08:43:47 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.30.37 with HTTP; Mon, 29 Oct 2012 08:43:47 -0700 (PDT)
In-Reply-To: <20121030014250.D5191@besplex.bde.org>
References: <201207301350.q6UDobCI099069@svn.freebsd.org>
 <CAJ-FndBj8tpC_BJXs_RH8sG2TBG8yA=Lxu3-GTVT9Ap_zOCuVQ@mail.gmail.com>
 <CAJ-FndDnO7wjnWPV0tTu+UGHjsxa3YDarMxmyei3ZmjLAFvRkQ@mail.gmail.com>
 <201207301732.33474.jhb@freebsd.org>
 <CAJ-FndD5EO12xsWOAe6u0EvX00q33wxO4OivnGjzj0=T2Oe8uA@mail.gmail.com>
 <CAJ-FndCRg0UCThFkatp=tw7rUWWCvhsApLE=iztLpxpGBC1F9w@mail.gmail.com>
 <CAJ-FndBqV2uD8Th9ePtxyJwhMAPzY3AXA5cQ7HszLp=+fSpHTA@mail.gmail.com>
 <CAJ-FndDPLmkpAJeGVN2wgbhdgHYezyUV-PPvH9e-CA7Go7HG3A@mail.gmail.com>
 <20121029155136.O943@besplex.bde.org>
 <CAJ-FndAyPVB8VS+zNZTUfVhSp9hSOZOjamBwxhhikq3gSMfs3g@mail.gmail.com>
 <20121030014250.D5191@besplex.bde.org>
Date: Mon, 29 Oct 2012 15:43:47 +0000
X-Google-Sender-Auth: tLroOjcHA09Z9mE5RnsKKK-5U0A
Message-ID: <CAJ-FndDcFBJsO4wcYc+K_B8etD27MPG9dheXJkFz0UGJK9pF9w@mail.gmail.com>
Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern
From: Attilio Rao <attilio@freebsd.org>
To: Bruce Evans <brde@optusnet.com.au>
Content-Type: text/plain; charset=UTF-8
Cc: src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>,
 Jeff Roberson <jeff@freebsd.org>, Florian Smeets <flo@freebsd.org>,
 Bruce Evans <bde@freebsd.org>, svn-src-projects@freebsd.org
X-BeenThere: svn-src-projects@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: "SVN commit messages for the src &quot; projects&quot;
 tree" <svn-src-projects.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-projects>, 
 <mailto:svn-src-projects-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-projects>
List-Post: <mailto:svn-src-projects@freebsd.org>
List-Help: <mailto:svn-src-projects-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-projects>, 
 <mailto:svn-src-projects-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Oct 2012 15:43:51 -0000

On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote:
> On Mon, 29 Oct 2012, Attilio Rao wrote:
>
>> On 10/29/12, Bruce Evans <brde@optusnet.com.au> wrote:
>>> On Mon, 29 Oct 2012, Attilio Rao wrote:
>>>
>>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>>>> this new patch, making critical_enter()/critical_exit() inline:
>>>> http://www.freebsd.org/~attilio/inline_critical.patch
>>>>
>>>> The concept is pretty simple: simple add/dec for critical_enter, exit
>>>> are inlined, the rest is in an "hard path". Debugging enables the hard
>>>> paths by default (really I think that only KTR may be due here, but I
>>>> thought that in case of INVARIANTS this was also wanted, so I added
>>>> the check also for that case).
>>> ...
>>> Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined.
>>> Similarly for mtx_unlock_spin() and critical_exit().  It saves 1
>>> function
>>> call. but critical_enter() does a function call anyway.  critical_exit*(
>>> also has a branch in branch in it that might cost more than the function
>>> call just for mispredicting it.
>>
>> Correct, that is a further argument for having inlined
>> critical_enter(),
>
> And for inlining neither, ot the opposite one like I do.
>
>> even if the actual calling cames from
>> spinlock_enter(), not critical_enter(), which must be MD (that's on
>> FreeBSD, not sure what happens in your OS).
>
> I forgot that I don't have the slow functions spinlock_enter() and
> spinlock_exit() in mtx_[un]lock_spin().  (My mutexes don't block
> interrupts, as required for fast interrupt handling that is actually
> fast (really low-latency).  My spinlocks just use critical*(), and
> critical*() doesn't block fast interrupt handling.)
>
> The spinlock_enter() calls mean that inlining mutex calls is even more
> bogus.  Instead of just 1 function call which does not much more than
> increment or decrement a counter, there is a nested call to the critical*()
> one and another call to spinlock_enter().  spinlock_enter() is MD and
> might need to do lots of slow hardware things.  critical_enter() does
> the following on i386:
>
> % void
> % spinlock_enter(void)
> % {
> % 	struct thread *td;
> % 	register_t flags;
> %
> % 	td = curthread;
> % 	if (td->td_md.md_spinlock_count == 0) {
> % 		flags = intr_disable();
>
> This is a CPU control instruction and thus tends to be slow.  It was very
> slow on Pentium4.  It might involve some serialization although it is
> not a full serialization instruction.
>
> % 		td->td_md.md_spinlock_count = 1;
> % 		flags &= ~PSL_T;
>
> The previous line is from my version.  It fixes spurious trace traps when
> the flags are popped in critical_exit().  Similar fixes are needed for
> the pushfl/popfl sequences in swtch.s.  The spurious trace traps were
> and might still be more harmful than they should be since they exercise
> deadlock bugs in syscons and/or printf.  Simply trace through a large
> amount of code in ddb, going through here a few times to set up spurious
> trace traps for several td's.   It may also be necessary to have syscons
> and/or printf doing non-ddb i/o.  Eventually the trace traps bite and
> demonstrate the deadlock.
>
> % 		td->td_md.md_saved_flags = flags;
> % 	} else
> % 		td->td_md.md_spinlock_count++;
> % 	critical_enter();
> % }
>
> Everything else uses simple non-control instructions so it is quite fast.
> However, if this is not serialized, then it can run in parallel with
> mtx_lock_spin() and vice versa since there are no inter-dependencies.
> It is unclear whether the parallelism is helped or harmed by not
> inlining mtx_lock_spin().
>
>>> My version goes the other way and uninlines mtx_lock_spin() and
>>> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
>>> critical_exit() in them.  This saves a lot of text space and thus
>>> hopefully saves time too.  I couldn't find any cases where it either
>>> ...
>>>
>>> OTOH, I couldn't get uninlining of mtx_lock() and mtx_unlock() to work.
>>> ..
>>
>> I don't think that uninling mtx_lock()/unlock() (btw, on which hw are
>> you testing them if you are still able to catch performance penalties
>> by branch misprediction?!) is a good idea, likely what would be a
>> better one is to both inline critical_enter() and spinlock_enter().
>
> Er, it is a good idea, as explained above.  Whether it is better in
> practice is very MD.  The mtx non-calls are already quite large, and
> adding critical*() and spinlock*() to them would make them larger.
> Above a certain MD size, inlining is just slower because it busts caches.
> spinlock*() is especially hard to inline because it does MD magic that
> might be even larger than the i386 version.

You are misunderstanding.
mtx_lock()/unlock() don't call spinlock_enter()/spinlock_exit() thus
their inlined call results more or less in a single atomic operation.
That must not be wrapped in a function call IMHO.
(If your OS does a quite different thing I don't know, I don't have
sources off-hand, but it is quite difficult to follow you sometimes
because you mix FreeBSD behaviour with your-OS behaviour).

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein