From owner-svn-src-projects@FreeBSD.ORG Sat Nov 24 15:43:29 2012 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 022A2ECF; Sat, 24 Nov 2012 15:43:29 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6AC0D8FC12; Sat, 24 Nov 2012 15:43:26 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id j13so9459690lah.13 for ; Sat, 24 Nov 2012 07:43:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=q8msky12cZxfMj7V7Dbx5wOeQ9FmGKtoWoWAbZ3FdME=; b=AdS2dKMh46CWqVrrFydfnzvvtwzmfI1ogcgmzxxUUfgMqhby02mQLauEUMhVQJwsLO UERr8Kzftaz2MTn+ci3chmQyEtA8nfxPI1ei82n+h0ap9cpU8Mg3yJyH5UdYz5CCZV+l l1cPmUHY0kjUFUHhFFbIjBjIwE+TznyF1tOtCiF9vRdXWDkd615inKB9UyoRp2Xp1/SZ Aacdly30/7Ax6xTHrztyEoXyCuNbHQv11EfB4+oVs33JW806m7lDHuonoP04b61uJk8Q DJvK3AEFUdAUhQOubkwuziJjC+rD5Rw3OKpeF5w7gyHx8iG96rN/+qCzK2g7+nXFZXA3 E2jg== MIME-Version: 1.0 Received: by 10.152.104.50 with SMTP id gb18mr6424578lab.9.1353771806090; Sat, 24 Nov 2012 07:43:26 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.134.5 with HTTP; Sat, 24 Nov 2012 07:43:26 -0800 (PST) In-Reply-To: <20121109034942.C5338@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <201207301732.33474.jhb@freebsd.org> <20121029155136.O943@besplex.bde.org> <20121109034942.C5338@besplex.bde.org> Date: Sat, 24 Nov 2012 15:43:26 +0000 X-Google-Sender-Auth: rzYqKIwVjZsP611JMYi_sF7e2Pw Message-ID: Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern From: Attilio Rao To: Bruce Evans Content-Type: text/plain; charset=UTF-8 Cc: src-committers@freebsd.org, John Baldwin , Jeff Roberson , Florian Smeets , Bruce Evans , svn-src-projects@freebsd.org X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Nov 2012 15:43:29 -0000 On Thu, Nov 8, 2012 at 5:26 PM, Bruce Evans wrote: > On Fri, 2 Nov 2012, Attilio Rao wrote: > >> On 10/29/12, Bruce Evans wrote: >>> >>> On Mon, 29 Oct 2012, Attilio Rao wrote: >>> >>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce >>>> this new patch, making critical_enter()/critical_exit() inline: >>>> http://www.freebsd.org/~attilio/inline_critical.patch >>> >>> ... >>> >>> My version goes the other way and uninlines mtx_lock_spin() and >>> mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and >>> ... >> >> >> So, I thought more about this and I think that inlining >> critical_exit() is not really going to bring any benefit here but >> bloat. >> This is because nested critical sections are rare rather not, which > > > Rather rare !not? :-) > > >> means you will end up in doing the function call most of the time and >> plus we have the possible pessimization introduced by the memory >> clobber (__compiler_membar()) and as you note possible deficiency >> caming from the branch misprediction*. > > > This seems best. > > I see a point about the rareness of the branch in critical_exit(). > Not sure if it is the one you are making: since the nested case is > rare, then the branch will normally be correctly predicted. If the > function remains uninlined, then the branch still has a good chance > of being correctly predicted. This depends on the nested case being > so rare across all callers, that the non-nested case doesn't mess > up the prediction by happening often. The branch predictor only > has to maintain history for 1 branch for this. However, if the > call is inlined and there are many callers, there might be too many > to maintain history for them all. Yes, that's basically the same conclusion I came up with. It seems you are not opposed to this version of the patch. I made some in-kernel benchmarks but they didn't really show a performance improvements, bigger than 1-2%, which on SMP system is basically accountable to thirdy-part effects. However we already employ the same code for sched_pin() and I then think that we can just commit this patch as-is. > >> I feel very unsure style-wise about my add to sys/systm.h because it >> is already very messy that I cannot give a logical sense to that >> chunk. > > > It is less bad than most places. Some of the inlines in libkern.h > are misplaced. However, the bitcount inlines in systm.h belong > closer to libkern (or /dev/null). The split between systm.h and > kernel.h is bogus. But systemy things like critical* don't > belong near libkern. > > >> * In theory we could use __predict_false() or similar but I don't >> think this will really help as x86 doesn't have explicit instructions >> to control branch prediction > > > It can help as follows on x86: > - modern x86 has a default for when the branch predictor is cold. The > compiler should arrange the code so that the default matches the > usual case. Compilers can either follow the code order or guess > which branch is usual (-fpredict-branch-probability IIRC). The > branches in critical_exit() and x86 spinlock_enter() seem to have > a good order for the first, but might be mispredicted by the compiler. > The branch in spinlock_exit() seems to be in a bad order for the first. > This doesn't matter if the CPU's branch predictor stays warm. > - compilers can help the CPU's branch predictor and instruction prefetcher > by moving the unusual case far away. This helps mainly for large code. > The code in critical_exit() is not large, except possibly if it is > inlined, > > But __predict_false() never did much for me, perhaps because I test mainly > non-large code in loops where branch predictors stay warm. Shouldn't large code benefit less likely by correct branch prediction? As the pipeline are not huge wrongly predicting a branch should have no effect if the jump label is far enough to not be pre-fetched. Anyway, I never saw __predict_false() giving a measurable performance bump on x86, neither x86 has explicit instructions to control it. That's what I wanted to actually say. Attilio -- Peace can only be achieved by understanding - A. Einstein