From owner-svn-src-head@freebsd.org Wed Mar 21 18:31:22 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6061EF5B080; Wed, 21 Mar 2018 18:31:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 9D3ED7F260; Wed, 21 Mar 2018 18:31:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 0683C42CD90; Thu, 22 Mar 2018 05:31:18 +1100 (AEDT) Date: Thu, 22 Mar 2018 05:31:18 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Warner Losh , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r331298 - head/sys/dev/syscons In-Reply-To: <20180322024846.S4293@besplex.bde.org> Message-ID: <20180322050628.U5175@besplex.bde.org> References: <201803211447.w2LElDcK091988@repo.freebsd.org> <20180322024846.S4293@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VJytp5HX c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=_9ZPQwydy-ByBSi8r-QA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Mar 2018 18:31:22 -0000 On Thu, 22 Mar 2018, Bruce Evans wrote: > On Wed, 21 Mar 2018, Warner Losh wrote: > >> Log: >> Unlock giant when calling shutdown_nice() > ... > This breaks the driver. Giant is syscons' driver lock, and also the > interrupt handler lock for at least the atkbd keyboard driver, so vt > sometimes holds the lock for. [That should have been "... lock too".] > ... > Actual testing shows that doesn't cause a panic, but it also doesn't > actually unlock for shutdown_nice(), since the lock is acquired twice > and only released once. syscons has much the same extra lock sprinkling > for event handling as vt: > > - intr_event_execute_handlers() acquires Giant and calls atkbdintr() > - atkbdintr() calls sckbdevent() > - sckbdevent() unnecessarily acquires Giant again > - the buggy unlocking drops Giant just once > - shutdown_nice() is called with Giant held > - the buggy unlocking fails to re-acquire Giant > - sckbdevent() releases Giant, leaving it not held > - sckbdevent() returns > - atkbdintr() returns > - intr_event_execute_handlers() releases Giant. This should panic, but > it apparently blocks for long enough for init to shut down first. > > When I trace the last step, I get a panic which might be either from the > different timing or just a bug in kdb. Testing with a kernel with other bugs fixed shows that the invariants violation really is detected, causing a panic instead of a nice shutdown for Ctrl-Alt-Del, but recursive panics pile up and printf() in -current is too broken to print anything. A better kernel prints: XX ppppppppppppppppppppppppppkernel trap 12 with interrupts disabled XX panic: kernel trap doesn't have ucred XX cpuid = 32 XX time = 1521655530 XX KDB: enter: panic XX panic: kernel trap doesn't have ucred XX cpuid = 32 XX time = 1521655530 XX KDB: enter: panic XX panic: kernel trap doesn't have ucred XX cpuid = 32 XX time = 1521655530 XX KDB: enter: panic XX panic: kernel trap doesn't have ucrepanic: kernel trap doesn't have ucred XX cpuid = 32 XX time = 1521655530 XX KDB: enter: panic where some messages are lost and some are duplicated (there are 8 CPUs). I got control by putting a breakpoint after stop_cpus_hard() in vpanic(). (A breakpoint at panic() crashes, probably for multiple CPUs hitting it, though this is supposed to be fixed in the test version.) The backtrace is then: XX Breakpoint at vpanic+0x4a: popl %ecx XX db> t XX Tracing pid 11 tid 100042 td 0xd6a61360 XX vpanic(c0916c91,d6685ab4,d6685ab4,c098f050,c098f040,...) at vpanic+0x4a/frame 0xd6685a8c XX kassert_panic(c0916c91,c093b956,c0930a24,c09066cf,557) at kassert_panic+0x49/frame 0xd6685aa8 XX witness_unlock(c098f040,8,c09066c6,557) at witness_unlock+0xe7/frame 0xd6685af0 XX __mtx_unlock_flags(c098f050,0,c09066c6,557) at __mtx_unlock_flags+0x65/frame 0xd6685b14 XX intr_event_execute_handlers(c8f466b0) at intr_event_execute_handlers+0xed/frame 0xd6685b40 XX ithread_execute_handlers(c8f466b0,0,80202,d6a61360,c8f466b0,...) at ithread_execute_handlers+0x21/frame 0xd6685b54 XX ithread_loop(d6cda250,d6685ba8,0,d6cda250,c06a2c56,...) at ithread_loop+0x5f/frame 0xd6685b74 XX fork_exit(c06a2c56,d6cda250,d6685ba8) at fork_exit+0x83/frame 0xd6685b94 XX fork_trampoline() at fork_trampoline+0x8 The stack trace is messed up are mis-decoded despite attempts to avoid this (use i386, don't use clang, and turn off auto-inlining...). panic() is not shown, and args passed in registers are not shown. There are just enough args to find the panic message. It is as expected: XX db> x/s 0xd6685ab4 XX 0xd6685ab4: V\271\223\300$\012\223\300\317f\220\300W\005 XX db> x/s 0xc0916c91 XX __func__.16466+0x1c41: lock (%s) %s not locked @ %s:%d Bruce