From owner-svn-src-projects@FreeBSD.ORG Mon Oct 29 15:43:50 2012 Return-Path: Delivered-To: svn-src-projects@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC6E3DF8; Mon, 29 Oct 2012 15:43:50 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 416298FC14; Mon, 29 Oct 2012 15:43:48 +0000 (UTC) Received: by mail-lb0-f182.google.com with SMTP id b5so4033829lbd.13 for ; Mon, 29 Oct 2012 08:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=ZFkjh6w6r0sfb8dqCjdw0zOkNKki2gl1LG1DEM6Vp4w=; b=RIVu5svEkFceOUXIEBf7JVsKkiMuPEuzdaDncwxrCgr+6QuwVb/6eT5zJaovjOIfDt KYBocJ5rlpZHgNoGJaqHLDMSahSKdMUp7NX4A1PBCWawfFs0vu45uXAlKSN0AKshlao5 UklRJaglR4PGy9M8Pck2vTHs4+xV0NnpSX9jt3qrDAgzn2nBV2f1XTWe/fpx/GK3N0t7 Vcfhra28maSdqLA019iUfdku59rFHeZxFiOD04vxa5sG/g1qt+3OLAGjkERBy1+1bXwn zcP0vEYRdgXi24F5UQhb9b3JJRk5SpaBVbKTEN7hoeuIeUtyOOKYYRYoW4zIpGrodJIi jMLw== MIME-Version: 1.0 Received: by 10.152.104.50 with SMTP id gb18mr28204569lab.9.1351525427578; Mon, 29 Oct 2012 08:43:47 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.30.37 with HTTP; Mon, 29 Oct 2012 08:43:47 -0700 (PDT) In-Reply-To: <20121030014250.D5191@besplex.bde.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <201207301732.33474.jhb@freebsd.org> <20121029155136.O943@besplex.bde.org> <20121030014250.D5191@besplex.bde.org> Date: Mon, 29 Oct 2012 15:43:47 +0000 X-Google-Sender-Auth: tLroOjcHA09Z9mE5RnsKKK-5U0A Message-ID: Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern From: Attilio Rao To: Bruce Evans Content-Type: text/plain; charset=UTF-8 Cc: src-committers@freebsd.org, John Baldwin , Jeff Roberson , Florian Smeets , Bruce Evans , svn-src-projects@freebsd.org X-BeenThere: svn-src-projects@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: "SVN commit messages for the src " projects" tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2012 15:43:51 -0000 On 10/29/12, Bruce Evans wrote: > On Mon, 29 Oct 2012, Attilio Rao wrote: > >> On 10/29/12, Bruce Evans wrote: >>> On Mon, 29 Oct 2012, Attilio Rao wrote: >>> >>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce >>>> this new patch, making critical_enter()/critical_exit() inline: >>>> http://www.freebsd.org/~attilio/inline_critical.patch >>>> >>>> The concept is pretty simple: simple add/dec for critical_enter, exit >>>> are inlined, the rest is in an "hard path". Debugging enables the hard >>>> paths by default (really I think that only KTR may be due here, but I >>>> thought that in case of INVARIANTS this was also wanted, so I added >>>> the check also for that case). >>> ... >>> Inlining of mtx_lock_spin() is bogus unless critical_enter() is inlined. >>> Similarly for mtx_unlock_spin() and critical_exit(). It saves 1 >>> function >>> call. but critical_enter() does a function call anyway. critical_exit*( >>> also has a branch in branch in it that might cost more than the function >>> call just for mispredicting it. >> >> Correct, that is a further argument for having inlined >> critical_enter(), > > And for inlining neither, ot the opposite one like I do. > >> even if the actual calling cames from >> spinlock_enter(), not critical_enter(), which must be MD (that's on >> FreeBSD, not sure what happens in your OS). > > I forgot that I don't have the slow functions spinlock_enter() and > spinlock_exit() in mtx_[un]lock_spin(). (My mutexes don't block > interrupts, as required for fast interrupt handling that is actually > fast (really low-latency). My spinlocks just use critical*(), and > critical*() doesn't block fast interrupt handling.) > > The spinlock_enter() calls mean that inlining mutex calls is even more > bogus. Instead of just 1 function call which does not much more than > increment or decrement a counter, there is a nested call to the critical*() > one and another call to spinlock_enter(). spinlock_enter() is MD and > might need to do lots of slow hardware things. critical_enter() does > the following on i386: > > % void > % spinlock_enter(void) > % { > % struct thread *td; > % register_t flags; > % > % td = curthread; > % if (td->td_md.md_spinlock_count == 0) { > % flags = intr_disable(); > > This is a CPU control instruction and thus tends to be slow. It was very > slow on Pentium4. It might involve some serialization although it is > not a full serialization instruction. > > % td->td_md.md_spinlock_count = 1; > % flags &= ~PSL_T; > > The previous line is from my version. It fixes spurious trace traps when > the flags are popped in critical_exit(). Similar fixes are needed for > the pushfl/popfl sequences in swtch.s. The spurious trace traps were > and might still be more harmful than they should be since they exercise > deadlock bugs in syscons and/or printf. Simply trace through a large > amount of code in ddb, going through here a few times to set up spurious > trace traps for several td's. It may also be necessary to have syscons > and/or printf doing non-ddb i/o. Eventually the trace traps bite and > demonstrate the deadlock. > > % td->td_md.md_saved_flags = flags; > % } else > % td->td_md.md_spinlock_count++; > % critical_enter(); > % } > > Everything else uses simple non-control instructions so it is quite fast. > However, if this is not serialized, then it can run in parallel with > mtx_lock_spin() and vice versa since there are no inter-dependencies. > It is unclear whether the parallelism is helped or harmed by not > inlining mtx_lock_spin(). > >>> My version goes the other way and uninlines mtx_lock_spin() and >>> mtx_unlock_spin(). Then it inlines (open codes) critical_enter() and >>> critical_exit() in them. This saves a lot of text space and thus >>> hopefully saves time too. I couldn't find any cases where it either >>> ... >>> >>> OTOH, I couldn't get uninlining of mtx_lock() and mtx_unlock() to work. >>> .. >> >> I don't think that uninling mtx_lock()/unlock() (btw, on which hw are >> you testing them if you are still able to catch performance penalties >> by branch misprediction?!) is a good idea, likely what would be a >> better one is to both inline critical_enter() and spinlock_enter(). > > Er, it is a good idea, as explained above. Whether it is better in > practice is very MD. The mtx non-calls are already quite large, and > adding critical*() and spinlock*() to them would make them larger. > Above a certain MD size, inlining is just slower because it busts caches. > spinlock*() is especially hard to inline because it does MD magic that > might be even larger than the i386 version. You are misunderstanding. mtx_lock()/unlock() don't call spinlock_enter()/spinlock_exit() thus their inlined call results more or less in a single atomic operation. That must not be wrapped in a function call IMHO. (If your OS does a quite different thing I don't know, I don't have sources off-hand, but it is quite difficult to follow you sometimes because you mix FreeBSD behaviour with your-OS behaviour). Attilio -- Peace can only be achieved by understanding - A. Einstein