From owner-freebsd-bugs@FreeBSD.ORG Fri May 29 14:32:41 2015 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8343BD71 for ; Fri, 29 May 2015 14:32:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2E9291F89 for ; Fri, 29 May 2015 14:32:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 32B6E3C41FC for ; Sat, 30 May 2015 00:32:16 +1000 (AEST) Date: Sat, 30 May 2015 00:32:15 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org cc: freebsd-bugs@freebsd.org Subject: Re: [Bug 200493] Killing pid 11 (idle) wedges the disk IO In-Reply-To: Message-ID: <20150529232657.L2299@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=L/MkHYj8 c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=9cW_t1CCXrUA:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=6I5d2MoRAAAA:8 a=BhfgTazteIY126q7_P0A:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 May 2015 14:32:41 -0000 On Fri, 29 May 2015 bugzilla-noreply@freebsd.org wrote: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200493 > > --- Comment #5 from Konstantin Belousov --- > (In reply to Edward Tomasz Napierala from comment #4) > Great, you nailed it down. > > Look at the tdsigwakeup(), which is called from the main code to send a signal > to a thread tdsendsignal(): > > /* > * Bring the priority of a thread up if we want it to get > * killed in this lifetime. > */ > if (action == SIG_DFL && (prop & SA_KILL) && td->td_priority > PUSER) > sched_prio(td, PUSER); That's a funny bug. But why would the priority even be looked at for CPU idle threads? idprio 31 idle threads have the same priority as CPU idle threads at their unclobbered priorities. The CPU idle threads should never run in preference to others. Bumping the priority to PUSER after a signal seems wrong for all sorts of idle threads. For non-timesharing threads, it is hard to get back to the original priority. I fixed priority initialization for device polling and noticed some vaguely related bugs. The idlepoll thread couldn't even get forward to its correct priority. (I don't believe in device polling, but it once gave be lower network latency and I tried to recover that.) This patch is for FreeBSD-9. X diff -c2 kern_poll.c~ kern_poll.c X *** kern_poll.c~ Wed Aug 6 20:13:13 2014 X --- kern_poll.c Fri May 8 01:45:11 2015 X *************** X *** 30,33 **** X --- 30,34 ---- X X #include "opt_device_polling.h" X + #include "opt_sched.h" X X #include X *************** X *** 38,45 **** X --- 39,48 ---- X #include X #include X + #include X #include /* needed by net/if.h */ X #include X #include X #include X + #include X X #include /* for IFF_* flags */ X *************** X *** 534,555 **** X X static void X ! poll_idle(void) X { X - struct thread *td = curthread; X - struct rtprio rtp; X - X - rtp.prio = RTP_PRIO_MAX; /* lowest priority */ X - rtp.type = RTP_PRIO_IDLE; X - PROC_SLOCK(td->td_proc); X - rtp_to_pri(&rtp, td); X - PROC_SUNLOCK(td->td_proc); This initialization was was not good originally, and has rotted. It depends rtp_to_pri() not only converting the rtp to td_priority, but also on the side effect of setting the new priority. This side effect no longer works here. rtp_to_prio() and its side effects are used for syscalls too. For syscalls, it is fragile but still works (perhaps delayed). The locking here has rotted. rtp_to_pri() is no longer locked by the spinlock that is aquired above. It is locked by a non-spinlock that is not acquired above. X X for (;;) { X if (poll_in_idle_loop && poll_handlers > 0) { X idlepoll_sleeping = 0; X ether_poll(poll_each_burst); X ! thread_lock(td); X mi_switch(SW_VOL, NULL); X ! thread_unlock(td); X } else { X idlepoll_sleeping = 1; X --- 537,559 ---- X X static void X ! poll_idle(void __unused *arg) X { X X + #define td curthread X + printf("poll_idle: pri %d upri %d baseupri %d lendupri %d\n", X + td->td_priority, td->td_user_pri, X + td->td_base_user_pri, td->td_lend_user_pri); X + #undef td Debugging code. td_lend_user_pri seems to be incorrectly initialized. I think it should equal the active priority (255 for idprio 31 threads like this one). But it is initialized to something like 120 (the base user priority) somewhere. This isn't a problem provided it is never used. But IIRC, when the above calls sched_prio() to try to change its priority to 255, the priority sticks at the "lent" priority of 120 although lending has never occurred. X for (;;) { X if (poll_in_idle_loop && poll_handlers > 0) { X idlepoll_sleeping = 0; X ether_poll(poll_each_burst); X ! #ifdef PREEMPTION X ! cpu_spinwait(); X ! #else X ! thread_lock(curthread); X mi_switch(SW_VOL, NULL); X ! thread_unlock(curthread); X ! #endif X } else { X idlepoll_sleeping = 1; Spinloops should sleep a bit if possible to avoid wasting power. With PREEMPTION, there is no need to yield, so we can sleep instead. Without PREEMPTION, I think sleeping a bit wouldn't be useful since most of the time is spent yielding. X *************** X *** 559,568 **** X } X X ! static struct proc *idlepoll; X ! static struct kproc_desc idlepoll_kp = { X ! "idlepoll", X ! poll_idle, X ! &idlepoll X ! }; X ! SYSINIT(idlepoll, SI_SUB_KTHREAD_VM, SI_ORDER_ANY, kproc_start, X ! &idlepoll_kp); X --- 563,602 ---- X } X X ! static void X ! poll_idle_start(void __unused *arg) X ! { X ! struct proc *p; X ! struct thread *td; X ! int error; X ! X ! error = kproc_create(poll_idle, NULL, &p, RFSTOPPED, 0, "idlepoll"); The fix is to copy this and the following code (except the debugging parts) from the vm idlezero thread. That thread is currently useless, but it always did idle priority initialization better. Idle priority thread initialization requires too much duplictation. kproc_start (used in the buggy version) is not useful for this, since sched_prio() at the start of the thread doesn't work and TDF_NOLOAD inside a thread is much further from working. X ! if (error) X ! panic("poll_idle_start: error %d\n", error); X ! td = FIRST_THREAD_IN_PROC(p); X ! thread_lock(td); X ! printf("poll_idle_start 0: pri %d upri %d baseupri %d lendupri %d\n", X ! td->td_priority, td->td_user_pri, X ! td->td_base_user_pri, td->td_lend_user_pri); More debugging code for td->td_lend_user_pri. I had this in rtp_to_pri() and haven't looked at its output here. In rtp_to_pri(), it is called for syscalls. The interaction of the 4 priorities printed is confusing. X ! /* X ! * XXX TDF_NOLOAD should be set automatically for kernel idle X ! * threads, and probably also for user idle threads, and possibly X ! * for many other kernel and user non-timesharing threads, since X ! * the load average is basically a SCHED_4BSD implementation X ! * detail for scheduling only timesharing threads. Having to X ! * set TDF_NOLOAD for ourself is especially inconvenient since X ! * setting it in a running thread just breaks decrementing the X ! * load count when then thread stops running. X ! */ X ! td->td_flags |= TDF_NOLOAD; This thread didn't set TDF_NOLOAD, so the load average was distorted. X ! sched_class(td, PRI_IDLE); X ! sched_prio(td, PRI_MAX_IDLE); X ! printf("poll_idle_start 1: pri %d upri %d baseupri %d lendupri %d\n", X ! td->td_priority, td->td_user_pri, X ! td->td_base_user_pri, td->td_lend_user_pri); X ! sched_add(td, SRQ_BORING); X ! printf("poll_idle_start 2: pri %d upri %d baseupri %d lendupri %d\n", X ! td->td_priority, td->td_user_pri, X ! td->td_base_user_pri, td->td_lend_user_pri); X ! thread_unlock(td); X ! } X ! SYSINIT(poll_idle, SI_SUB_KTHREAD_VM, SI_ORDER_ANY, poll_idle_start, NULL); Bruce