From owner-svn-src-head@freebsd.org Sun Nov 12 06:48:47 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D7016E6A5BE; Sun, 12 Nov 2017 06:48:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 20FCD7E8A6; Sun, 12 Nov 2017 06:48:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 337C01049217; Sun, 12 Nov 2017 17:48:38 +1100 (AEDT) Date: Sun, 12 Nov 2017 17:48:33 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mateusz Guzik cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r325734 - head/sys/amd64/amd64 In-Reply-To: <201711120313.vAC3D1o4074273@repo.freebsd.org> Message-ID: <20171112151145.Q1144@besplex.bde.org> References: <201711120313.vAC3D1o4074273@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=cK6QihWN c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=4XHJDQAroa2FB1xilKcA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Nov 2017 06:48:47 -0000 On Sun, 12 Nov 2017, Mateusz Guzik wrote: > Log: > amd64: stop nesting preemption counter in spinlock_enter > > Discussed with: jhb This seems to break it. i386 still has the old code. It also moves the critical section a little, so that it is inconsistent for enter/exit. I think the critical section placement was harmlessly wrong, and moving it further fixes the new bugs too. > Modified: head/sys/amd64/amd64/machdep.c > ============================================================================== > --- head/sys/amd64/amd64/machdep.c Sun Nov 12 02:34:33 2017 (r325733) > +++ head/sys/amd64/amd64/machdep.c Sun Nov 12 03:13:01 2017 (r325734) > @@ -1853,9 +1853,9 @@ spinlock_enter(void) > flags = intr_disable(); > td->td_md.md_spinlock_count = 1; > td->td_md.md_saved_flags = flags; > + critical_enter(); > } else > td->td_md.md_spinlock_count++; > - critical_enter(); > } > > void The main broken case is: - both levels initially 0 - disable interrupts - raise spinlock count to 1 - bad race window until critical_enter() is called. Disabling hardware interrupts doesn't prevent exceptions like debugger traps or NMIs. Debuuger trap handlers shouldn't use critical sections or (spin) mutexes, but NMI handlers might. When an exception handler calls spinlock_enter(), this no longer gives a critical section and bad things like context switches occur if the handler calls critical_enter/ exit(). The main old race is: - as above, except the race is not so bad (I think it is harmless). Nested calls to spinlock_enter() can still occur, but at all levels, spinlock_enter() never returns without entering a critical section, so callers aren't affected. I think the worst that happens is a when nested spinlock_exit() lowers the critical level to 0, context switches may occur. This isn't a large problem since higher spinlock_ enter() levels are still entering -- they have disabled interrupts, but nothing depends on this. See r214835 for larger bugs related to the old race. Exceptions cannot be prevented, and r214835 defends against them only for the spinlock context. Defending against preemption is even easier. Just move the critical_enter() to before disabling interrupts. > @@ -1865,11 +1865,12 @@ spinlock_exit(void) > register_t flags; > > td = curthread; > - critical_exit(); > flags = td->td_md.md_saved_flags; > td->td_md.md_spinlock_count--; > - if (td->td_md.md_spinlock_count == 0) > + if (td->td_md.md_spinlock_count == 0) { > + critical_exit(); > intr_restore(flags); > + } > } > > /* This used to call critical_exit() at the beginning. This corresponds to calling critical_enter() at the end of spinlock_enter(), and gives the same race. Now it calls critical_exit() after lowering the spinlock count. This closes only half of the old race window. After moving the call to critical_enter() earlier, moving the call to critical_exit() later to match closes all of the old race window. The new bug only affects spinlock entry. For exit, there is just the old race with a reduced window. I don't like this change. The nested counting is easier to understand, and the nested case is very rare and the critical section part of it is very efficient (then critical_exit() is just lowering the level). Old versions of these functions were written to try to reduce branches and other bad scheduling for the usual non-nested case. Perhaps this isn't done right. It is intentionally not obfuscated with __predict_*(). spinlock_exit() ended up without the optimization(?) of avoiding the decrement in the usual case, where spinlock_enter() uses special 0/1 logic for the usual case and needs extra code for the increment in the unusual case. Since the slow parts are probably the critical_*() function calls and the flags changes, perhaps those should be optimized instead. The old version tried to do this for the function calls by doing them unconditionally. This should have worked best for the call in critical_exit() since the call is first (but should be last). The order is critical, and the flags changes use volatile asms which force the order more than elsewhere. I think the nested case is only for recursive spinlocks. So NMI handlers should not use any spinlocks and the new bug is small (NMI handlers should not use non-recursive spinlocks since they would deadlock, and should not use recursive spinlocks since they don't work). NMI handlers aren't that careful. They call printf(), and even the message buffer has been broken to use non-recursive spinlocks. Bruce