From owner-freebsd-current@freebsd.org Sat Jul 30 19:14:12 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF71EBA98E8 for ; Sat, 30 Jul 2016 19:14:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id DC44718A9 for ; Sat, 30 Jul 2016 19:14:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id DBABCBA98E7; Sat, 30 Jul 2016 19:14:12 +0000 (UTC) Delivered-To: current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DB57DBA98E6 for ; Sat, 30 Jul 2016 19:14:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B78D118A8 for ; Sat, 30 Jul 2016 19:14:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id D286EB987; Sat, 30 Jul 2016 15:14:11 -0400 (EDT) From: John Baldwin To: gljennjohn@gmail.com Cc: current@freebsd.org Subject: Re: EARLY_AP_STARTUP hangs during boot Date: Sat, 30 Jul 2016 12:03:59 -0700 Message-ID: <8097239.52FCHCROUA@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.3-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20160730094422.68e1b8db@ernst.home> References: <20160516122242.39249a54@ernst.home> <2732687.Cf9hD9SkSs@ralph.baldwin.cx> <20160730094422.68e1b8db@ernst.home> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Sat, 30 Jul 2016 15:14:11 -0400 (EDT) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Jul 2016 19:14:13 -0000 On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote: > On Fri, 29 Jul 2016 13:17:42 -0700 > John Baldwin wrote: > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote: > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP! I > > > wasn't aware of that. I prefer BSD and that's the scheduler I did > > > the first tests with. > > > > > > But with the ULE scheduler the system comes up all the way. > > > > > > It would be nice if the BSD scheduler could also be modified to > > > work with EARLY_AP_STARTUP. > > > > I wasn't able to reproduce your hang with 4BSD, but I think I see a > > possible problem. Try this: > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c > > index 7de56b6..d53331a 100644 > > --- a/sys/kern/sched_4bsd.c > > +++ b/sys/kern/sched_4bsd.c > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td) > > * - The current thread has a higher (numerically lower) or > > * equivalent priority. Note that this prevents curthread from > > * trying to preempt to itself. > > - * - It is too early in the boot for context switches (cold is set). > > * - The current thread has an inhibitor set or is in the process of > > * exiting. In this case, the current thread is about to switch > > * out anyways, so there's no point in preempting. If we did, > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td) > > ("maybe_preempt: trying to run inhibited thread")); > > pri = td->td_priority; > > cpri = ctd->td_priority; > > - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ || > > + if (panicstr != NULL || pri >= cpri /* || dumping */ || > > TD_IS_INHIBITED(ctd)) > > return (0); > > #ifndef FULL_PREEMPTION > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum) > > if ((!forward_wakeup_enabled) || > > (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0)) > > return (0); > > - if (!smp_started || cold || panicstr) > > + if (!smp_started || panicstr) > > return (0); > > > > forward_wakeups_requested++; > > > > Thanks, but with this patch the kernel hangs in exactly the same > place as before - after the HPET output. > > Maybe I'm missing some kernel option which ULE works around, or > something like that. Hmm, ok. Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)' Please also add this patch (on top of the previous patch): diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c index 2973a23..bab2278 100644 --- a/sys/kern/sched_4bsd.c +++ b/sys/kern/sched_4bsd.c @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags) KASSERT(td->td_flags & TDF_INMEM, ("sched_add: thread swapped out")); + CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid, + sched_tdname(td)); KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add", "prio:%d", td->td_priority, KTR_ATTR_LINKED, sched_tdname(curthread)); diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c index f07b97e..1f418f1 100644 --- a/sys/x86/x86/cpu_machdep.c +++ b/sys/x86/x86/cpu_machdep.c @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu) return (0); if (*state == STATE_MWAIT) *state = STATE_RUNNING; + CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu); return (1); } (I haven't tried compiling it, you might have to add the sys/ktr.h header to cpu_machdep.c if it doesn't build.) Hopefully we will get some better trace messages before it hangs with this added info. The root issue seems to be that 4BSD is pinning thread0 to some other CPU (due to sched_bind that happens inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs) and that other CPU isn't waking up to realize it needs to run thread0. -- John Baldwin